With additional minor updates to AttributeRuler docstrings.
13 KiB
title | tag | source | new | teaser | api_string_name | api_trainable |
---|---|---|---|---|---|---|
AttributeRuler | class | spacy/pipeline/attributeruler.py | 3 | Pipeline component for rule-based token attribute assignment | attribute_ruler | false |
The attribute ruler lets you set token attributes for tokens identified by
Matcher
patterns. The attribute ruler is
typically used to handle exceptions for token attributes and to map values
between attributes such as mapping fine-grained POS tags to coarse-grained POS
tags.
Config and implementation
The default config is defined by the pipeline component factory and describes
how the component should be configured. You can override its settings via the
config
argument on nlp.add_pipe
or in your
config.cfg
for training.
Example
config = { "validation": True, "pattern_dicts": None, } nlp.add_pipe("attribute_ruler", config=config)
Setting | Type | Description | Default |
---|---|---|---|
pattern_dicts |
Iterable[dict] |
A list of pattern dicts with the keys as the arguments to AttributeRuler.add (patterns /attrs /index ) to add as patterns. |
None |
validation |
bool | Whether patterns should be validated, passed to Matcher as validate . |
False |
https://github.com/explosion/spaCy/blob/develop/spacy/pipeline/attributeruler.py
AttributeRuler.__init__
Initialize the attribute ruler. If pattern dicts are supplied here, they need to
be a list of dictionaries with "patterns"
, "attrs"
, and optional "index"
keys, e.g.:
pattern_dicts = \[
{"patterns": \[\[{"TAG": "VB"}\]\], "attrs": {"POS": "VERB"}},
{"patterns": \[\[{"LOWER": "an"}\]\], "attrs": {"LEMMA": "a"}},
\]
Example
# Construction via add_pipe attribute_ruler = nlp.add_pipe("attribute_ruler")
Name | Type | Description |
---|---|---|
vocab |
Vocab |
The shared nlp object to pass the vocab to the matchers and process phrase patterns. |
name |
str | Instance name of the current pipeline component. Typically passed in automatically from the factory when the component is added. Used to disable the current entity ruler while creating phrase patterns with the nlp object. |
keyword-only | ||
pattern_dicts |
Iterable[Dict]] |
Optional patterns to load in on initialization. |
validate |
bool | Whether patterns should be validated, passed to Matcher and PhraseMatcher as validate . Defaults to False . |
AttributeRuler.__call__
Apply the attribute ruler to a Doc, setting token attributes for tokens matched by the provided patterns.
Name | Type | Description |
---|---|---|
doc |
Doc |
The Doc object to process, e.g. the Doc in the pipeline. |
RETURNS | Doc |
The modified Doc with added entities, if available. |
AttributeRuler.add
Add patterns to the attribute ruler. The patterns are a list of Matcher
patterns and the attributes are a dict of attributes to set on the matched
token. If the pattern matches a span of more than one token, the index
can be
used to set the attributes for the token at that index in the span. The index
may be negative to index from the end of the span.
Example
attribute_ruler = nlp.add_pipe("attribute_ruler") patterns = [[{"TAG": "VB"}]] attrs = {"POS": "VERB"} attribute_ruler.add(patterns=patterns, attrs=attrs)
Name | Type | Description |
---|---|---|
patterns | Iterable[List[Dict]] |
A list of Matcher patterns. |
attrs | dict | The attributes to assign to the target token in the matched span. |
index | int | The index of the token in the matched span to modify. May be negative to index from the end of the span. Defaults to 0. |
AttributeRuler.add_patterns
Example
attribute_ruler = nlp.add_pipe("attribute_ruler") pattern_dicts = \[ { "patterns": \[\[{"TAG": "VB"}\]\], "attrs": {"POS": "VERB"} }, { "patterns": \[\[{"LOWER": "two"}, {"LOWER": "apples"}\]\], "attrs": {"LEMMA": "apple"}, "index": -1 }, \] attribute_ruler.add_patterns(pattern_dicts)
Add patterns from a list of pattern dicts with the keys as the arguments to
AttributeRuler.add
.
Name | Type | Description |
---|---|---|
pattern_dicts |
Iterable[Dict]] |
The patterns to add. |
AttributeRuler.patterns
Get all patterns that have been added to the attribute ruler in the
patterns_dict
format accepted by
AttributeRuler.add_patterns
.
Name | Type | Description |
---|---|---|
RETURNS | List[dict] |
The patterns added to the attribute ruler. |
AttributeRuler.load_from_tag_map
Load attribute ruler patterns from a tag map.
Name | Type | Description |
---|---|---|
tag_map |
dict | The tag map that maps fine-grained tags to coarse-grained tags and morphological features. |
AttributeRuler.load_from_morph_rules
Load attribute ruler patterns from morph rules.
Name | Type | Description |
---|---|---|
morph_rules |
dict | The morph rules that map token text and fine-grained tags to coarse-grained tags, lemmas and morphological features. |
AttributeRuler.to_disk
Serialize the pipe to disk.
Example
attribute_ruler = nlp.add_pipe("attribute_ruler") attribute_ruler.to_disk("/path/to/attribute_ruler")
Name | Type | Description |
---|---|---|
path |
str / Path |
A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path -like objects. |
keyword-only | ||
exclude |
Iterable[str] |
String names of serialization fields to exclude. |
AttributeRuler.from_disk
Load the pipe from disk. Modifies the object in place and returns it.
Example
attribute_ruler = nlp.add_pipe("attribute_ruler") attribute_ruler.from_disk("/path/to/attribute_ruler")
Name | Type | Description |
---|---|---|
path |
str / Path |
A path to a directory. Paths may be either strings or Path -like objects. |
keyword-only | ||
exclude |
Iterable[str] |
String names of serialization fields to exclude. |
RETURNS | AttributeRuler |
The modified AttributeRuler object. |
AttributeRuler.to_bytes
Example
attribute_ruler = nlp.add_pipe("attribute_ruler") attribute_ruler_bytes = attribute_ruler.to_bytes()
Serialize the pipe to a bytestring.
Name | Type | Description |
---|---|---|
keyword-only | ||
exclude |
Iterable[str] |
String names of serialization fields to exclude. |
RETURNS | bytes | The serialized form of the AttributeRuler object. |
AttributeRuler.from_bytes
Load the pipe from a bytestring. Modifies the object in place and returns it.
Example
attribute_ruler_bytes = attribute_ruler.to_bytes() attribute_ruler = nlp.add_pipe("attribute_ruler") attribute_ruler.from_bytes(attribute_ruler_bytes)
Name | Type | Description |
---|---|---|
bytes_data |
bytes | The data to load from. |
keyword-only | ||
exclude |
Iterable[str] |
String names of serialization fields to exclude. |
RETURNS | AttributeRuler |
The AttributeRuler object. |
Serialization fields
During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the exclude
argument.
Example
data = attribute_ruler.to_disk("/path", exclude=["vocab"])
Name | Description |
---|---|
vocab |
The shared Vocab . |
patterns |
The Matcher patterns. You usually don't want to exclude this. |
attrs |
The attributes to set. You usually don't want to exclude this. |
indices |
The token indices. You usually don't want to exclude this. |