Update docs to reflect Doc input to Language (#11555)

This commit is contained in:
Paul O'Leary McCann 2022-09-29 18:50:29 +09:00 committed by GitHub
parent 6d7630c5d3
commit ba63f57f81
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -164,6 +164,9 @@ examples, see the
Apply the pipeline to some text. The text can span multiple sentences, and can Apply the pipeline to some text. The text can span multiple sentences, and can
contain arbitrary whitespace. Alignment into the original string is preserved. contain arbitrary whitespace. Alignment into the original string is preserved.
Instead of text, a `Doc` can be passed as input, in which case tokenization is
skipped, but the rest of the pipeline is run.
> #### Example > #### Example
> >
> ```python > ```python
@ -173,7 +176,7 @@ contain arbitrary whitespace. Alignment into the original string is preserved.
| Name | Description | | Name | Description |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- | | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `text` | The text to be processed. ~~str~~ | | `text` | The text to be processed, or a Doc. ~~Union[str, Doc]~~ |
| _keyword-only_ | | | _keyword-only_ | |
| `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~ | | `disable` | Names of pipeline components to [disable](/usage/processing-pipelines#disabling). ~~List[str]~~ |
| `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ | | `component_cfg` | Optional dictionary of keyword arguments for components, keyed by component names. Defaults to `None`. ~~Optional[Dict[str, Dict[str, Any]]]~~ |
@ -184,6 +187,9 @@ contain arbitrary whitespace. Alignment into the original string is preserved.
Process texts as a stream, and yield `Doc` objects in order. This is usually Process texts as a stream, and yield `Doc` objects in order. This is usually
more efficient than processing texts one-by-one. more efficient than processing texts one-by-one.
Instead of text, a `Doc` object can be passed as input. In this case
tokenization is skipped but the rest of the pipeline is run.
> #### Example > #### Example
> >
> ```python > ```python
@ -194,7 +200,7 @@ more efficient than processing texts one-by-one.
| Name | Description | | Name | Description |
| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `texts` | A sequence of strings. ~~Iterable[str]~~ | | `texts` | A sequence of strings (or `Doc` objects). ~~Iterable[Union[str, Doc]]~~ |
| _keyword-only_ | | | _keyword-only_ | |
| `as_tuples` | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. ~~bool~~ | | `as_tuples` | If set to `True`, inputs should be a sequence of `(text, context)` tuples. Output will then be a sequence of `(doc, context)` tuples. Defaults to `False`. ~~bool~~ |
| `batch_size` | The number of texts to buffer. ~~Optional[int]~~ | | `batch_size` | The number of texts to buffer. ~~Optional[int]~~ |