mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-28 02:04:07 +03:00
Merge branch 'develop' into spacy.io
This commit is contained in:
commit
8458379cf5
86
README.md
86
README.md
|
@ -6,7 +6,7 @@ spaCy is a library for advanced Natural Language Processing in Python and
|
||||||
Cython. It's built on the very latest research, and was designed from day one
|
Cython. It's built on the very latest research, and was designed from day one
|
||||||
to be used in real products. spaCy comes with
|
to be used in real products. spaCy comes with
|
||||||
[pre-trained statistical models](https://spacy.io/models) and word vectors, and
|
[pre-trained statistical models](https://spacy.io/models) and word vectors, and
|
||||||
currently supports tokenization for **30+ languages**. It features the
|
currently supports tokenization for **45+ languages**. It features the
|
||||||
**fastest syntactic parser** in the world, convolutional
|
**fastest syntactic parser** in the world, convolutional
|
||||||
**neural network models** for tagging, parsing and **named entity recognition**
|
**neural network models** for tagging, parsing and **named entity recognition**
|
||||||
and easy **deep learning** integration. It's commercial open-source software,
|
and easy **deep learning** integration. It's commercial open-source software,
|
||||||
|
@ -20,29 +20,30 @@ released under the MIT license.
|
||||||
[![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square)](https://pypi.python.org/pypi/spacy)
|
[![pypi Version](https://img.shields.io/pypi/v/spacy.svg?style=flat-square)](https://pypi.python.org/pypi/spacy)
|
||||||
[![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square)](https://anaconda.org/conda-forge/spacy)
|
[![conda Version](https://img.shields.io/conda/vn/conda-forge/spacy.svg?style=flat-square)](https://anaconda.org/conda-forge/spacy)
|
||||||
[![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)
|
[![Python wheels](https://img.shields.io/badge/wheels-%E2%9C%93-4c1.svg?longCache=true&style=flat-square&logo=python&logoColor=white)](https://github.com/explosion/wheelwright/releases)
|
||||||
|
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/ambv/black)
|
||||||
[![spaCy on Twitter](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/spacy_io)
|
[![spaCy on Twitter](https://img.shields.io/twitter/follow/spacy_io.svg?style=social&label=Follow)](https://twitter.com/spacy_io)
|
||||||
|
|
||||||
## 📖 Documentation
|
## 📖 Documentation
|
||||||
|
|
||||||
| Documentation | |
|
| Documentation | |
|
||||||
| --- | --- |
|
| --------------- | -------------------------------------------------------------- |
|
||||||
| [spaCy 101] | New to spaCy? Here's everything you need to know!
|
| [spaCy 101] | New to spaCy? Here's everything you need to know! |
|
||||||
| [Usage Guides] | How to use spaCy and its features. |
|
| [Usage Guides] | How to use spaCy and its features. |
|
||||||
| [New in v2.0] | New features, backwards incompatibilities and migration guide. |
|
| [New in v2.1] | New features, backwards incompatibilities and migration guide. |
|
||||||
| [API Reference] | The detailed reference for spaCy's API. |
|
| [API Reference] | The detailed reference for spaCy's API. |
|
||||||
| [Models] | Download statistical language models for spaCy. |
|
| [Models] | Download statistical language models for spaCy. |
|
||||||
| [Universe] | Libraries, extensions, demos, books and courses. |
|
| [Universe] | Libraries, extensions, demos, books and courses. |
|
||||||
| [Changelog] | Changes and version history. |
|
| [Changelog] | Changes and version history. |
|
||||||
| [Contribute] | How to contribute to the spaCy project and code base. |
|
| [Contribute] | How to contribute to the spaCy project and code base. |
|
||||||
|
|
||||||
[spaCy 101]: https://spacy.io/usage/spacy-101
|
[spacy 101]: https://spacy.io/usage/spacy-101
|
||||||
[New in v2.0]: https://spacy.io/usage/v2#migrating
|
[new in v2.1]: https://spacy.io/usage/v2-1
|
||||||
[Usage Guides]: https://spacy.io/usage/
|
[usage guides]: https://spacy.io/usage/
|
||||||
[API Reference]: https://spacy.io/api/
|
[api reference]: https://spacy.io/api/
|
||||||
[Models]: https://spacy.io/models
|
[models]: https://spacy.io/models
|
||||||
[Universe]: https://spacy.io/universe
|
[universe]: https://spacy.io/universe
|
||||||
[Changelog]: https://spacy.io/usage/#changelog
|
[changelog]: https://spacy.io/usage/#changelog
|
||||||
[Contribute]: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md
|
[contribute]: https://github.com/explosion/spaCy/blob/master/CONTRIBUTING.md
|
||||||
|
|
||||||
## 💬 Where to ask questions
|
## 💬 Where to ask questions
|
||||||
|
|
||||||
|
@ -51,33 +52,36 @@ and [@ines](https://github.com/ines). Please understand that we won't be able
|
||||||
to provide individual support via email. We also believe that help is much more
|
to provide individual support via email. We also believe that help is much more
|
||||||
valuable if it's shared publicly, so that more people can benefit from it.
|
valuable if it's shared publicly, so that more people can benefit from it.
|
||||||
|
|
||||||
* **Bug Reports**: [GitHub Issue Tracker]
|
| Type | Platforms |
|
||||||
* **Usage Questions**: [Stack Overflow] · [Gitter Chat] · [Reddit User Group]
|
| ------------------------ | ------------------------------------------------------ |
|
||||||
* **General Discussion**: [Gitter Chat] · [Reddit User Group]
|
| 🚨**Bug Reports** | [GitHub Issue Tracker] |
|
||||||
|
| 🎁 **Feature Requests** | [GitHub Issue Tracker] |
|
||||||
|
| 👩💻**Usage Questions** | [Stack Overflow] · [Gitter Chat] · [Reddit User Group] |
|
||||||
|
| 🗯 **General Discussion** | [Gitter Chat] · [Reddit User Group] |
|
||||||
|
|
||||||
[GitHub Issue Tracker]: https://github.com/explosion/spaCy/issues
|
[github issue tracker]: https://github.com/explosion/spaCy/issues
|
||||||
[Stack Overflow]: http://stackoverflow.com/questions/tagged/spacy
|
[stack overflow]: http://stackoverflow.com/questions/tagged/spacy
|
||||||
[Gitter Chat]: https://gitter.im/explosion/spaCy
|
[gitter chat]: https://gitter.im/explosion/spaCy
|
||||||
[Reddit User Group]: https://www.reddit.com/r/spacynlp
|
[reddit user group]: https://www.reddit.com/r/spacynlp
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
* **Fastest syntactic parser** in the world
|
- **Fastest syntactic parser** in the world
|
||||||
* **Named entity** recognition
|
- **Named entity** recognition
|
||||||
* Non-destructive **tokenization**
|
- Non-destructive **tokenization**
|
||||||
* Support for **30+ languages**
|
- Support for **45+ languages**
|
||||||
* Pre-trained [statistical models](https://spacy.io/models) and word vectors
|
- Pre-trained [statistical models](https://spacy.io/models) and word vectors
|
||||||
* Easy **deep learning** integration
|
- Easy **deep learning** integration
|
||||||
* Part-of-speech tagging
|
- Part-of-speech tagging
|
||||||
* Labelled dependency parsing
|
- Labelled dependency parsing
|
||||||
* Syntax-driven sentence segmentation
|
- Syntax-driven sentence segmentation
|
||||||
* Built in **visualizers** for syntax and NER
|
- Built in **visualizers** for syntax and NER
|
||||||
* Convenient string-to-hash mapping
|
- Convenient string-to-hash mapping
|
||||||
* Export to numpy data arrays
|
- Export to numpy data arrays
|
||||||
* Efficient binary serialization
|
- Efficient binary serialization
|
||||||
* Easy **model packaging** and deployment
|
- Easy **model packaging** and deployment
|
||||||
* State-of-the-art speed
|
- State-of-the-art speed
|
||||||
* Robust, rigorously evaluated accuracy
|
- Robust, rigorously evaluated accuracy
|
||||||
|
|
||||||
📖 **For more details, see the
|
📖 **For more details, see the
|
||||||
[facts, figures and benchmarks](https://spacy.io/usage/facts-figures).**
|
[facts, figures and benchmarks](https://spacy.io/usage/facts-figures).**
|
||||||
|
@ -87,9 +91,9 @@ valuable if it's shared publicly, so that more people can benefit from it.
|
||||||
For detailed installation instructions, see the
|
For detailed installation instructions, see the
|
||||||
[documentation](https://spacy.io/usage).
|
[documentation](https://spacy.io/usage).
|
||||||
|
|
||||||
* **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual Studio)
|
- **Operating system**: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual Studio)
|
||||||
* **Python version**: Python 2.7, 3.4+ (only 64 bit)
|
- **Python version**: Python 2.7, 3.4+ (only 64 bit)
|
||||||
* **Package managers**: [pip] · [conda] (via `conda-forge`)
|
- **Package managers**: [pip] · [conda] (via `conda-forge`)
|
||||||
|
|
||||||
[pip]: https://pypi.python.org/pypi/spacy
|
[pip]: https://pypi.python.org/pypi/spacy
|
||||||
[conda]: https://anaconda.org/conda-forge/spacy
|
[conda]: https://anaconda.org/conda-forge/spacy
|
||||||
|
@ -153,12 +157,12 @@ other module. Models can be installed using spaCy's `download` command,
|
||||||
or manually by pointing pip to a path or URL.
|
or manually by pointing pip to a path or URL.
|
||||||
|
|
||||||
| Documentation | |
|
| Documentation | |
|
||||||
| --- | --- |
|
| ---------------------- | ------------------------------------------------------------- |
|
||||||
| [Available Models] | Detailed model descriptions, accuracy figures and benchmarks. |
|
| [Available Models] | Detailed model descriptions, accuracy figures and benchmarks. |
|
||||||
| [Models Documentation] | Detailed usage instructions. |
|
| [Models Documentation] | Detailed usage instructions. |
|
||||||
|
|
||||||
[Available Models]: https://spacy.io/models
|
[available models]: https://spacy.io/models
|
||||||
[Models Documentation]: https://spacy.io/docs/usage/models
|
[models documentation]: https://spacy.io/docs/usage/models
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# out-of-the-box: download best-matching default model
|
# out-of-the-box: download best-matching default model
|
||||||
|
|
|
@ -38,16 +38,18 @@ shortcut for this and instantiate the component using its string name and
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | `Vocab` | The shared vocabulary. |
|
| `vocab` | `Vocab` | The shared vocabulary. |
|
||||||
| `model` | `thinc.neural.Model` or `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
| `model` | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
||||||
| `**cfg` | - | Configuration parameters. |
|
| `**cfg` | - | Configuration parameters. |
|
||||||
| **RETURNS** | `DependencyParser` | The newly constructed object. |
|
| **RETURNS** | `DependencyParser` | The newly constructed object. |
|
||||||
|
|
||||||
## DependencyParser.\_\_call\_\_ {#call tag="method"}
|
## DependencyParser.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
Both [`__call__`](/api/dependencyparser#call) and
|
This usually happens under the hood when you call the `nlp` object on a text and
|
||||||
|
all pipeline components are applied to the `Doc` in order. Both
|
||||||
|
[`__call__`](/api/dependencyparser#call) and
|
||||||
[`pipe`](/api/dependencyparser#pipe) delegate to the
|
[`pipe`](/api/dependencyparser#pipe) delegate to the
|
||||||
[`predict`](/api/dependencyparser#predict) and
|
[`predict`](/api/dependencyparser#predict) and
|
||||||
[`set_annotations`](/api/dependencyparser#set_annotations) methods.
|
[`set_annotations`](/api/dependencyparser#set_annotations) methods.
|
||||||
|
@ -57,6 +59,7 @@ Both [`__call__`](/api/dependencyparser#call) and
|
||||||
> ```python
|
> ```python
|
||||||
> parser = DependencyParser(nlp.vocab)
|
> parser = DependencyParser(nlp.vocab)
|
||||||
> doc = nlp(u"This is a sentence.")
|
> doc = nlp(u"This is a sentence.")
|
||||||
|
> # This usually happens under the hood
|
||||||
> processed = parser(doc)
|
> processed = parser(doc)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -83,7 +86,7 @@ Apply the pipe to a stream of documents. Both
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ------------ | -------- | -------------------------------------------------------------------------------------------------------------- |
|
| ------------ | -------- | ------------------------------------------------------ |
|
||||||
| `stream` | iterable | A stream of documents. |
|
| `stream` | iterable | A stream of documents. |
|
||||||
| `batch_size` | int | The number of texts to buffer. Defaults to `128`. |
|
| `batch_size` | int | The number of texts to buffer. Defaults to `128`. |
|
||||||
| **YIELDS** | `Doc` | Processed documents in the order of the original text. |
|
| **YIELDS** | `Doc` | Processed documents in the order of the original text. |
|
||||||
|
|
|
@ -38,16 +38,18 @@ shortcut for this and instantiate the component using its string name and
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | `Vocab` | The shared vocabulary. |
|
| `vocab` | `Vocab` | The shared vocabulary. |
|
||||||
| `model` | `thinc.neural.Model` or `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
| `model` | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
||||||
| `**cfg` | - | Configuration parameters. |
|
| `**cfg` | - | Configuration parameters. |
|
||||||
| **RETURNS** | `EntityRecognizer` | The newly constructed object. |
|
| **RETURNS** | `EntityRecognizer` | The newly constructed object. |
|
||||||
|
|
||||||
## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
|
## EntityRecognizer.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
Both [`__call__`](/api/entityrecognizer#call) and
|
This usually happens under the hood when you call the `nlp` object on a text and
|
||||||
|
all pipeline components are applied to the `Doc` in order. Both
|
||||||
|
[`__call__`](/api/entityrecognizer#call) and
|
||||||
[`pipe`](/api/entityrecognizer#pipe) delegate to the
|
[`pipe`](/api/entityrecognizer#pipe) delegate to the
|
||||||
[`predict`](/api/entityrecognizer#predict) and
|
[`predict`](/api/entityrecognizer#predict) and
|
||||||
[`set_annotations`](/api/entityrecognizer#set_annotations) methods.
|
[`set_annotations`](/api/entityrecognizer#set_annotations) methods.
|
||||||
|
@ -57,6 +59,7 @@ Both [`__call__`](/api/entityrecognizer#call) and
|
||||||
> ```python
|
> ```python
|
||||||
> ner = EntityRecognizer(nlp.vocab)
|
> ner = EntityRecognizer(nlp.vocab)
|
||||||
> doc = nlp(u"This is a sentence.")
|
> doc = nlp(u"This is a sentence.")
|
||||||
|
> # This usually happens under the hood
|
||||||
> processed = ner(doc)
|
> processed = ner(doc)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -83,7 +86,7 @@ Apply the pipe to a stream of documents. Both
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ------------ | -------- | -------------------------------------------------------------------------------------------------------------- |
|
| ------------ | -------- | ------------------------------------------------------ |
|
||||||
| `stream` | iterable | A stream of documents. |
|
| `stream` | iterable | A stream of documents. |
|
||||||
| `batch_size` | int | The number of texts to buffer. Defaults to `128`. |
|
| `batch_size` | int | The number of texts to buffer. Defaults to `128`. |
|
||||||
| **YIELDS** | `Doc` | Processed documents in the order of the original text. |
|
| **YIELDS** | `Doc` | Processed documents in the order of the original text. |
|
||||||
|
|
|
@ -38,17 +38,19 @@ shortcut for this and instantiate the component using its string name and
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ----------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | `Vocab` | The shared vocabulary. |
|
| `vocab` | `Vocab` | The shared vocabulary. |
|
||||||
| `model` | `thinc.neural.Model` or `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
| `model` | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
||||||
| `**cfg` | - | Configuration parameters. |
|
| `**cfg` | - | Configuration parameters. |
|
||||||
| **RETURNS** | `Tagger` | The newly constructed object. |
|
| **RETURNS** | `Tagger` | The newly constructed object. |
|
||||||
|
|
||||||
## Tagger.\_\_call\_\_ {#call tag="method"}
|
## Tagger.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
Both [`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to
|
This usually happens under the hood when you call the `nlp` object on a text and
|
||||||
the [`predict`](/api/tagger#predict) and
|
all pipeline components are applied to the `Doc` in order. Both
|
||||||
|
[`__call__`](/api/tagger#call) and [`pipe`](/api/tagger#pipe) delegate to the
|
||||||
|
[`predict`](/api/tagger#predict) and
|
||||||
[`set_annotations`](/api/tagger#set_annotations) methods.
|
[`set_annotations`](/api/tagger#set_annotations) methods.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
|
@ -56,6 +58,7 @@ the [`predict`](/api/tagger#predict) and
|
||||||
> ```python
|
> ```python
|
||||||
> tagger = Tagger(nlp.vocab)
|
> tagger = Tagger(nlp.vocab)
|
||||||
> doc = nlp(u"This is a sentence.")
|
> doc = nlp(u"This is a sentence.")
|
||||||
|
> # This usually happens under the hood
|
||||||
> processed = tagger(doc)
|
> processed = tagger(doc)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
@ -80,7 +83,7 @@ Apply the pipe to a stream of documents. Both [`__call__`](/api/tagger#call) and
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ------------ | -------- | -------------------------------------------------------------------------------------------------------------- |
|
| ------------ | -------- | ------------------------------------------------------ |
|
||||||
| `stream` | iterable | A stream of documents. |
|
| `stream` | iterable | A stream of documents. |
|
||||||
| `batch_size` | int | The number of texts to buffer. Defaults to `128`. |
|
| `batch_size` | int | The number of texts to buffer. Defaults to `128`. |
|
||||||
| **YIELDS** | `Doc` | Processed documents in the order of the original text. |
|
| **YIELDS** | `Doc` | Processed documents in the order of the original text. |
|
||||||
|
|
|
@ -31,6 +31,7 @@ shortcut for this and instantiate the component using its string name and
|
||||||
> ```python
|
> ```python
|
||||||
> # Construction via create_pipe
|
> # Construction via create_pipe
|
||||||
> textcat = nlp.create_pipe("textcat")
|
> textcat = nlp.create_pipe("textcat")
|
||||||
|
> textcat = nlp.create_pipe("textcat", config={"exclusive_classes": True})
|
||||||
>
|
>
|
||||||
> # Construction from class
|
> # Construction from class
|
||||||
> from spacy.pipeline import TextCategorizer
|
> from spacy.pipeline import TextCategorizer
|
||||||
|
@ -39,18 +40,34 @@ shortcut for this and instantiate the component using its string name and
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
| Name | Type | Description |
|
| Name | Type | Description |
|
||||||
| ----------- | ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| ------------------- | ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `vocab` | `Vocab` | The shared vocabulary. |
|
| `vocab` | `Vocab` | The shared vocabulary. |
|
||||||
| `model` | `thinc.neural.Model` or `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
| `model` | `thinc.neural.Model` / `True` | The model powering the pipeline component. If no model is supplied, the model is created when you call `begin_training`, `from_disk` or `from_bytes`. |
|
||||||
| `**cfg` | - | Configuration parameters. |
|
| `exclusive_classes` | bool | Make categories mutually exclusive. Defaults to `False`. |
|
||||||
|
| `architecture` | unicode | Model architecture to use, see [architectures](#architectures) for details. Defaults to `"ensemble"`. |
|
||||||
| **RETURNS** | `TextCategorizer` | The newly constructed object. |
|
| **RETURNS** | `TextCategorizer` | The newly constructed object. |
|
||||||
|
|
||||||
|
### Architectures {#architectures new="2.1"}
|
||||||
|
|
||||||
|
Text classification models can be used to solve a wide variety of problems.
|
||||||
|
Differences in text length, number of labels, difficulty, and runtime
|
||||||
|
performance constraints mean that no single algorithm performs well on all types
|
||||||
|
of problems. To handle a wider variety of problems, the `TextCategorizer` object
|
||||||
|
allows configuration of its model architecture, using the `architecture` keyword
|
||||||
|
argument.
|
||||||
|
|
||||||
|
| Name | Description |
|
||||||
|
| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| `"ensemble"` | **Default:** Stacked ensemble of a unigram bag-of-words model and a neural network model. The neural network uses a CNN with mean pooling and attention. |
|
||||||
|
| `"simple_cnn"` | A neural network model where token vectors are calculated using a CNN. The vectors are mean pooled and used as features in a feed-forward network. |
|
||||||
|
|
||||||
## TextCategorizer.\_\_call\_\_ {#call tag="method"}
|
## TextCategorizer.\_\_call\_\_ {#call tag="method"}
|
||||||
|
|
||||||
Apply the pipe to one document. The document is modified in place, and returned.
|
Apply the pipe to one document. The document is modified in place, and returned.
|
||||||
Both [`__call__`](/api/textcategorizer#call) and
|
This usually happens under the hood when you call the `nlp` object on a text and
|
||||||
[`pipe`](/api/textcategorizer#pipe) delegate to the
|
all pipeline components are applied to the `Doc` in order. Both
|
||||||
[`predict`](/api/textcategorizer#predict) and
|
[`__call__`](/api/textcategorizer#call) and [`pipe`](/api/textcategorizer#pipe)
|
||||||
|
delegate to the [`predict`](/api/textcategorizer#predict) and
|
||||||
[`set_annotations`](/api/textcategorizer#set_annotations) methods.
|
[`set_annotations`](/api/textcategorizer#set_annotations) methods.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
|
@ -58,6 +75,7 @@ Both [`__call__`](/api/textcategorizer#call) and
|
||||||
> ```python
|
> ```python
|
||||||
> textcat = TextCategorizer(nlp.vocab)
|
> textcat = TextCategorizer(nlp.vocab)
|
||||||
> doc = nlp(u"This is a sentence.")
|
> doc = nlp(u"This is a sentence.")
|
||||||
|
> # This usually happens under the hood
|
||||||
> processed = textcat(doc)
|
> processed = textcat(doc)
|
||||||
> ```
|
> ```
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user