mirror of
https://github.com/explosion/spaCy.git
synced 2025-01-13 10:46:29 +03:00
Update docs [ci skip]
This commit is contained in:
parent
67652bcbb5
commit
329b61ee7b
|
@ -4,7 +4,6 @@ tag: class
|
||||||
source: spacy/pipeline/attributeruler.py
|
source: spacy/pipeline/attributeruler.py
|
||||||
new: 3
|
new: 3
|
||||||
teaser: 'Pipeline component for rule-based token attribute assignment'
|
teaser: 'Pipeline component for rule-based token attribute assignment'
|
||||||
api_base_class: /api/pipe
|
|
||||||
api_string_name: attribute_ruler
|
api_string_name: attribute_ruler
|
||||||
api_trainable: false
|
api_trainable: false
|
||||||
---
|
---
|
||||||
|
|
|
@ -14,7 +14,7 @@ for how to use the `TrainablePipe` base class to implement custom components.
|
||||||
|
|
||||||
<!-- TODO: Pipe vs TrainablePipe, check methods below (all renamed to TrainablePipe for now) -->
|
<!-- TODO: Pipe vs TrainablePipe, check methods below (all renamed to TrainablePipe for now) -->
|
||||||
|
|
||||||
> #### Why is TrainablePipe implemented in Cython?
|
> #### Why is it implemented in Cython?
|
||||||
>
|
>
|
||||||
> The `TrainablePipe` class is implemented in a `.pyx` module, the extension
|
> The `TrainablePipe` class is implemented in a `.pyx` module, the extension
|
||||||
> used by [Cython](/api/cython). This is needed so that **other** Cython
|
> used by [Cython](/api/cython). This is needed so that **other** Cython
|
||||||
|
|
|
@ -3,7 +3,6 @@ title: Sentencizer
|
||||||
tag: class
|
tag: class
|
||||||
source: spacy/pipeline/sentencizer.pyx
|
source: spacy/pipeline/sentencizer.pyx
|
||||||
teaser: 'Pipeline component for rule-based sentence boundary detection'
|
teaser: 'Pipeline component for rule-based sentence boundary detection'
|
||||||
api_base_class: /api/pipe
|
|
||||||
api_string_name: sentencizer
|
api_string_name: sentencizer
|
||||||
api_trainable: false
|
api_trainable: false
|
||||||
---
|
---
|
||||||
|
@ -130,9 +129,9 @@ Score a batch of examples.
|
||||||
|
|
||||||
## Sentencizer.to_disk {#to_disk tag="method"}
|
## Sentencizer.to_disk {#to_disk tag="method"}
|
||||||
|
|
||||||
Save the sentencizer settings (punctuation characters) to a directory. Will create
|
Save the sentencizer settings (punctuation characters) to a directory. Will
|
||||||
a file `sentencizer.json`. This also happens automatically when you save an
|
create a file `sentencizer.json`. This also happens automatically when you save
|
||||||
`nlp` object with a sentencizer added to its pipeline.
|
an `nlp` object with a sentencizer added to its pipeline.
|
||||||
|
|
||||||
> #### Example
|
> #### Example
|
||||||
>
|
>
|
||||||
|
|
|
@ -20,13 +20,13 @@ It also orchestrates training and serialization.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| [`Language`](/api/language) | Processing class that turns text into `Doc` objects. Different languages implement their own subclasses of it. The variable is typically called `nlp`. |
|
|
||||||
| [`Doc`](/api/doc) | A container for accessing linguistic annotations. |
|
| [`Doc`](/api/doc) | A container for accessing linguistic annotations. |
|
||||||
|
| [`DocBin`](/api/docbin) | A collection of `Doc` objects for efficient binary serialization. Also used for [training data](/api/data-formats#binary-training). |
|
||||||
|
| [`Example`](/api/example) | A collection of training annotations, containing two `Doc` objects: the reference data and the predictions. |
|
||||||
|
| [`Language`](/api/language) | Processing class that turns text into `Doc` objects. Different languages implement their own subclasses of it. The variable is typically called `nlp`. |
|
||||||
|
| [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
|
||||||
| [`Span`](/api/span) | A slice from a `Doc` object. |
|
| [`Span`](/api/span) | A slice from a `Doc` object. |
|
||||||
| [`Token`](/api/token) | An individual token — i.e. a word, punctuation symbol, whitespace, etc. |
|
| [`Token`](/api/token) | An individual token — i.e. a word, punctuation symbol, whitespace, etc. |
|
||||||
| [`Lexeme`](/api/lexeme) | An entry in the vocabulary. It's a word type with no context, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse etc. |
|
|
||||||
| [`Example`](/api/example) | A collection of training annotations, containing two `Doc` objects: the reference data and the predictions. |
|
|
||||||
| [`DocBin`](/api/docbin) | A collection of `Doc` objects for efficient binary serialization. Also used for [training data](/api/data-formats#binary-training). |
|
|
||||||
|
|
||||||
### Processing pipeline {#architecture-pipeline}
|
### Processing pipeline {#architecture-pipeline}
|
||||||
|
|
||||||
|
@ -42,23 +42,22 @@ components for different language processing tasks and also allows adding
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ----------------------------------------------- | ------------------------------------------------------------------------------------------- |
|
| ----------------------------------------------- | ------------------------------------------------------------------------------------------- |
|
||||||
| [`Tokenizer`](/api/tokenizer) | Segment raw text and create `Doc` objects from the words. |
|
|
||||||
| [`Tok2Vec`](/api/tok2vec) | Apply a "token-to-vector" model and set its outputs. |
|
|
||||||
| [`Transformer`](/api/transformer) | Use a transformer model and set its outputs. |
|
|
||||||
| [`Lemmatizer`](/api/lemmatizer) | Determine the base forms of words. |
|
|
||||||
| [`Morphologizer`](/api/morphologizer) | Predict morphological features and coarse-grained part-of-speech tags. |
|
|
||||||
| [`Tagger`](/api/tagger) | Predict part-of-speech tags. |
|
|
||||||
| [`AttributeRuler`](/api/attributeruler) | Set token attributes using matcher rules. |
|
| [`AttributeRuler`](/api/attributeruler) | Set token attributes using matcher rules. |
|
||||||
| [`DependencyParser`](/api/dependencyparser) | Predict syntactic dependencies. |
|
| [`DependencyParser`](/api/dependencyparser) | Predict syntactic dependencies. |
|
||||||
|
| [`EntityLinker`](/api/entitylinker) | Disambiguate named entities to nodes in a knowledge base. |
|
||||||
| [`EntityRecognizer`](/api/entityrecognizer) | Predict named entities, e.g. persons or products. |
|
| [`EntityRecognizer`](/api/entityrecognizer) | Predict named entities, e.g. persons or products. |
|
||||||
| [`EntityRuler`](/api/entityruler) | Add entity spans to the `Doc` using token-based rules or exact phrase matches. |
|
| [`EntityRuler`](/api/entityruler) | Add entity spans to the `Doc` using token-based rules or exact phrase matches. |
|
||||||
| [`EntityLinker`](/api/entitylinker) | Disambiguate named entities to nodes in a knowledge base. |
|
| [`Lemmatizer`](/api/lemmatizer) | Determine the base forms of words. |
|
||||||
| [`TextCategorizer`](/api/textcategorizer) | Predict categories or labels over the whole document. |
|
| [`Morphologizer`](/api/morphologizer) | Predict morphological features and coarse-grained part-of-speech tags. |
|
||||||
| [`Sentencizer`](/api/sentencizer) | Implement rule-based sentence boundary detection that doesn't require the dependency parse. |
|
|
||||||
| [`SentenceRecognizer`](/api/sentencerecognizer) | Predict sentence boundaries. |
|
| [`SentenceRecognizer`](/api/sentencerecognizer) | Predict sentence boundaries. |
|
||||||
| [Other functions](/api/pipeline-functions) | Automatically apply something to the `Doc`, e.g. to merge spans of tokens. |
|
| [`Sentencizer`](/api/sentencizer) | Implement rule-based sentence boundary detection that doesn't require the dependency parse. |
|
||||||
| [`Pipe`](/api/pipe) | Base class that pipeline components may inherit from. |
|
| [`Tagger`](/api/tagger) | Predict part-of-speech tags. |
|
||||||
|
| [`TextCategorizer`](/api/textcategorizer) | Predict categories or labels over the whole document. |
|
||||||
|
| [`Tok2Vec`](/api/tok2vec) | Apply a "token-to-vector" model and set its outputs. |
|
||||||
|
| [`Tokenizer`](/api/tokenizer) | Segment raw text and create `Doc` objects from the words. |
|
||||||
| [`TrainablePipe`](/api/pipe) | Class that all trainable pipeline components inherit from. |
|
| [`TrainablePipe`](/api/pipe) | Class that all trainable pipeline components inherit from. |
|
||||||
|
| [`Transformer`](/api/transformer) | Use a transformer model and set its outputs. |
|
||||||
|
| [Other functions](/api/pipeline-functions) | Automatically apply something to the `Doc`, e.g. to merge spans of tokens. |
|
||||||
|
|
||||||
### Matchers {#architecture-matchers}
|
### Matchers {#architecture-matchers}
|
||||||
|
|
||||||
|
@ -68,20 +67,20 @@ operates on a `Doc` and gives you access to the matched tokens **in context**.
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
|
| [`DependencyMatcher`](/api/dependencymatcher) | Match sequences of tokens based on dependency trees using [Semgrex operators](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html). |
|
||||||
| [`Matcher`](/api/matcher) | Match sequences of tokens, based on pattern rules, similar to regular expressions. |
|
| [`Matcher`](/api/matcher) | Match sequences of tokens, based on pattern rules, similar to regular expressions. |
|
||||||
| [`PhraseMatcher`](/api/phrasematcher) | Match sequences of tokens based on phrases. |
|
| [`PhraseMatcher`](/api/phrasematcher) | Match sequences of tokens based on phrases. |
|
||||||
| [`DependencyMatcher`](/api/dependencymatcher) | Match sequences of tokens based on dependency trees using [Semgrex operators](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html). |
|
|
||||||
|
|
||||||
### Other classes {#architecture-other}
|
### Other classes {#architecture-other}
|
||||||
|
|
||||||
| Name | Description |
|
| Name | Description |
|
||||||
| ------------------------------------------------ | -------------------------------------------------------------------------------------------------- |
|
| ------------------------------------------------ | -------------------------------------------------------------------------------------------------- |
|
||||||
| [`Vocab`](/api/vocab) | The shared vocabulary that stores strings and gives you access to [`Lexeme`](/api/lexeme) objects. |
|
| [`Corpus`](/api/corpus) | Class for managing annotated corpora for training and evaluation data. |
|
||||||
|
| [`KnowledgeBase`](/api/kb) | Storage for entities and aliases of a knowledge base for entity linking. |
|
||||||
|
| [`Lookups`](/api/lookups) | Container for convenient access to large lookup tables and dictionaries. |
|
||||||
|
| [`MorphAnalysis`](/api/morphology#morphanalysis) | A morphological analysis. |
|
||||||
|
| [`Morphology`](/api/morphology) | Store morphological analyses and map them to and from hash values. |
|
||||||
|
| [`Scorer`](/api/scorer) | Compute evaluation scores. |
|
||||||
| [`StringStore`](/api/stringstore) | Map strings to and from hash values. |
|
| [`StringStore`](/api/stringstore) | Map strings to and from hash values. |
|
||||||
| [`Vectors`](/api/vectors) | Container class for vector data keyed by string. |
|
| [`Vectors`](/api/vectors) | Container class for vector data keyed by string. |
|
||||||
| [`Lookups`](/api/lookups) | Container for convenient access to large lookup tables and dictionaries. |
|
| [`Vocab`](/api/vocab) | The shared vocabulary that stores strings and gives you access to [`Lexeme`](/api/lexeme) objects. |
|
||||||
| [`Morphology`](/api/morphology) | Store morphological analyses and map them to and from hash values. |
|
|
||||||
| [`MorphAnalysis`](/api/morphology#morphanalysis) | A morphological analysis. |
|
|
||||||
| [`KnowledgeBase`](/api/kb) | Storage for entities and aliases of a knowledge base for entity linking. |
|
|
||||||
| [`Scorer`](/api/scorer) | Compute evaluation scores. |
|
|
||||||
| [`Corpus`](/api/corpus) | Class for managing annotated corpora for training and evaluation data. |
|
|
||||||
|
|
|
@ -94,13 +94,13 @@
|
||||||
{ "text": "EntityRuler", "url": "/api/entityruler" },
|
{ "text": "EntityRuler", "url": "/api/entityruler" },
|
||||||
{ "text": "Lemmatizer", "url": "/api/lemmatizer" },
|
{ "text": "Lemmatizer", "url": "/api/lemmatizer" },
|
||||||
{ "text": "Morphologizer", "url": "/api/morphologizer" },
|
{ "text": "Morphologizer", "url": "/api/morphologizer" },
|
||||||
{ "text": "Pipe", "url": "/api/pipe" },
|
|
||||||
{ "text": "SentenceRecognizer", "url": "/api/sentencerecognizer" },
|
{ "text": "SentenceRecognizer", "url": "/api/sentencerecognizer" },
|
||||||
{ "text": "Sentencizer", "url": "/api/sentencizer" },
|
{ "text": "Sentencizer", "url": "/api/sentencizer" },
|
||||||
{ "text": "Tagger", "url": "/api/tagger" },
|
{ "text": "Tagger", "url": "/api/tagger" },
|
||||||
{ "text": "TextCategorizer", "url": "/api/textcategorizer" },
|
{ "text": "TextCategorizer", "url": "/api/textcategorizer" },
|
||||||
{ "text": "Tok2Vec", "url": "/api/tok2vec" },
|
{ "text": "Tok2Vec", "url": "/api/tok2vec" },
|
||||||
{ "text": "Tokenizer", "url": "/api/tokenizer" },
|
{ "text": "Tokenizer", "url": "/api/tokenizer" },
|
||||||
|
{ "text": "TrainablePipe", "url": "/api/pipe" },
|
||||||
{ "text": "Transformer", "url": "/api/transformer" },
|
{ "text": "Transformer", "url": "/api/transformer" },
|
||||||
{ "text": "Other Functions", "url": "/api/pipeline-functions" }
|
{ "text": "Other Functions", "url": "/api/pipeline-functions" }
|
||||||
]
|
]
|
||||||
|
|
Loading…
Reference in New Issue
Block a user