| title | tag | source | new | 
| EntityRuler | class | spacy/pipeline/entityruler.py | 2.1 | 
The EntityRuler lets you add spans to the Doc.ents using
token-based rules or exact phrase matches. It can be combined with the
statistical EntityRecognizer to boost accuracy, or
used on its own to implement a purely rule-based entity recognition system.
After initialization, the component is typically added to the processing
pipeline using nlp.add_pipe. For usage examples, see
the docs on
rule-based entity recognition.
EntityRuler.__init__
Initialize the entity ruler. If patterns are supplied here, they need to be a
list of dictionaries with a "label" and "pattern" key. A pattern can either
be a token pattern (list) or a phrase pattern (string). For example:
{'label': 'ORG', 'pattern': 'Apple'}.
Example
# Construction via create_pipe
ruler = nlp.create_pipe("entity_ruler")
# Construction from class
from spacy.pipeline import EntityRuler
ruler = EntityRuler(nlp, overwrite_ents=True)
| Name | Type | Description | 
| nlp | Language | The shared nlp object to pass the vocab to the matchers and process phrase patterns. | 
| patterns | iterable | Optional patterns to load in. | 
| phrase_matcher_attr | int / str | Optional attr to pass to the internal PhraseMatcher. defaults toNone | 
| validate | bool | Whether patterns should be validated, passed to Matcher and PhraseMatcher as validate. Defaults toFalse. | 
| overwrite_ents | bool | If existing entities are present, e.g. entities added by the model, overwrite them by matches if necessary. Defaults to False. | 
| **cfg | - | Other config parameters. If pipeline component is loaded as part of a model pipeline, this will include all keyword arguments passed to spacy.load. | 
| RETURNS | EntityRuler | The newly constructed object. | 
EntityRuler._\len__
The number of all patterns added to the entity ruler.
Example
ruler = EntityRuler(nlp)
assert len(ruler) == 0
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
assert len(ruler) == 1
| Name | Type | Description | 
| RETURNS | int | The number of patterns. | 
EntityRuler.__contains__
Whether a label is present in the patterns.
Example
ruler = EntityRuler(nlp)
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
assert "ORG" in ruler
assert not "PERSON" in ruler
| Name | Type | Description | 
| label | str | The label to check. | 
| RETURNS | bool | Whether the entity ruler contains the label. | 
EntityRuler.__call__
Find matches in the Doc and add them to the doc.ents. Typically, this
happens automatically after the component has been added to the pipeline using
nlp.add_pipe. If the entity ruler was initialized
with overwrite_ents=True, existing entities will be replaced if they overlap
with the matches. When matches overlap in a Doc, the entity ruler prioritizes
longer patterns over shorter, and if equal the match occuring first in the Doc
is chosen.
Example
ruler = EntityRuler(nlp)
ruler.add_patterns([{"label": "ORG", "pattern": "Apple"}])
nlp.add_pipe(ruler)
doc = nlp("A text about Apple.")
ents = [(ent.text, ent.label_) for ent in doc.ents]
assert ents == [("Apple", "ORG")]
| Name | Type | Description | 
| doc | Doc | The Docobject to process, e.g. theDocin the pipeline. | 
| RETURNS | Doc | The modified Docwith added entities, if available. | 
EntityRuler.add_patterns
Add patterns to the entity ruler. A pattern can either be a token pattern (list
of dicts) or a phrase pattern (string). For more details, see the usage guide on
rule-based matching.
Example
patterns = [
    {"label": "ORG", "pattern": "Apple"},
    {"label": "GPE", "pattern": [{"lower": "san"}, {"lower": "francisco"}]}
]
ruler = EntityRuler(nlp)
ruler.add_patterns(patterns)
| Name | Type | Description | 
| patterns | list | The patterns to add. | 
EntityRuler.to_disk
Save the entity ruler patterns to a directory. The patterns will be saved as
newline-delimited JSON (JSONL). If a file with the suffix .jsonl is provided,
only the patterns are saved as JSONL. If a directory name is provided, a
patterns.jsonl and cfg file with the component configuration is exported.
Example
ruler = EntityRuler(nlp)
ruler.to_disk("/path/to/patterns.jsonl")  # saves patterns only
ruler.to_disk("/path/to/entity_ruler")    # saves patterns and config
| Name | Type | Description | 
| path | str / Path | A path to a JSONL file or directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects. | 
EntityRuler.from_disk
Load the entity ruler from a file. Expects either a file containing
newline-delimited JSON (JSONL) with one entry per line, or a directory
containing a patterns.jsonl file and a cfg file with the component
configuration.
Example
ruler = EntityRuler(nlp)
ruler.from_disk("/path/to/patterns.jsonl")  # loads patterns only
ruler.from_disk("/path/to/entity_ruler")    # loads patterns and config
| Name | Type | Description | 
| path | str / Path | A path to a JSONL file or directory. Paths may be either strings or Path-like objects. | 
| RETURNS | EntityRuler | The modified EntityRulerobject. | 
EntityRuler.to_bytes
Serialize the entity ruler patterns to a bytestring.
Example
ruler = EntityRuler(nlp)
ruler_bytes = ruler.to_bytes()
| Name | Type | Description | 
| RETURNS | bytes | The serialized patterns. | 
EntityRuler.from_bytes
Load the pipe from a bytestring. Modifies the object in place and returns it.
Example
ruler_bytes = ruler.to_bytes()
ruler = EntityRuler(nlp)
ruler.from_bytes(ruler_bytes)
| Name | Type | Description | 
| patterns_bytes | bytes | The bytestring to load. | 
| RETURNS | EntityRuler | The modified EntityRulerobject. | 
EntityRuler.labels
All labels present in the match patterns.
| Name | Type | Description | 
| RETURNS | tuple | The string labels. | 
EntityRuler.ent_ids
All entity ids present in the match patterns id properties.
| Name | Type | Description | 
| RETURNS | tuple | The string ent_ids. | 
EntityRuler.patterns
Get all patterns that were added to the entity ruler.
| Name | Type | Description | 
| RETURNS | list | The original patterns, one dictionary per pattern. | 
Attributes
| Name | Type | Description | 
| matcher | Matcher | The underlying matcher used to process token patterns. | 
| phrase_matcher | PhraseMatcher | The underlying phrase matcher, used to process phrase patterns. | 
| token_patterns | dict | The token patterns present in the entity ruler, keyed by label. | 
| phrase_patterns | dict | The phrase patterns present in the entity ruler, keyed by label. |