mirror of
https://github.com/explosion/spaCy.git
synced 2025-02-22 22:40:32 +03:00
Merge remote-tracking branch 'origin/master' into feature/improve-pretrain
This commit is contained in:
commit
794870b5a5
|
@ -986,37 +986,6 @@ doc = nlp("Apple is opening its first big office in San Francisco.")
|
||||||
print([(ent.text, ent.label_) for ent in doc.ents])
|
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||||
```
|
```
|
||||||
|
|
||||||
### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
|
|
||||||
|
|
||||||
The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
|
|
||||||
pattern. Using the `id` attribute allows multiple patterns to be associated with
|
|
||||||
the same entity.
|
|
||||||
|
|
||||||
```python
|
|
||||||
### {executable="true"}
|
|
||||||
from spacy.lang.en import English
|
|
||||||
from spacy.pipeline import EntityRuler
|
|
||||||
|
|
||||||
nlp = English()
|
|
||||||
ruler = EntityRuler(nlp)
|
|
||||||
patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
|
|
||||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
|
|
||||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
|
|
||||||
ruler.add_patterns(patterns)
|
|
||||||
nlp.add_pipe(ruler)
|
|
||||||
|
|
||||||
doc1 = nlp("Apple is opening its first big office in San Francisco.")
|
|
||||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
|
|
||||||
|
|
||||||
doc2 = nlp("Apple is opening its first big office in San Fran.")
|
|
||||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc2.ents])
|
|
||||||
```
|
|
||||||
|
|
||||||
If the `id` attribute is included in the [`EntityRuler`](/api/entityruler)
|
|
||||||
patterns, the `ent_id_` property of the matched entity is set to the `id` given
|
|
||||||
in the patterns. So in the example above it's easy to identify that "San
|
|
||||||
Francisco" and "San Fran" are both the same entity.
|
|
||||||
|
|
||||||
The entity ruler is designed to integrate with spaCy's existing statistical
|
The entity ruler is designed to integrate with spaCy's existing statistical
|
||||||
models and enhance the named entity recognizer. If it's added **before the
|
models and enhance the named entity recognizer. If it's added **before the
|
||||||
`"ner"` component**, the entity recognizer will respect the existing entity
|
`"ner"` component**, the entity recognizer will respect the existing entity
|
||||||
|
@ -1051,6 +1020,37 @@ The `EntityRuler` can validate patterns against a JSON schema with the option
|
||||||
ruler = EntityRuler(nlp, validate=True)
|
ruler = EntityRuler(nlp, validate=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
|
||||||
|
|
||||||
|
The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
|
||||||
|
pattern. Using the `id` attribute allows multiple patterns to be associated with
|
||||||
|
the same entity.
|
||||||
|
|
||||||
|
```python
|
||||||
|
### {executable="true"}
|
||||||
|
from spacy.lang.en import English
|
||||||
|
from spacy.pipeline import EntityRuler
|
||||||
|
|
||||||
|
nlp = English()
|
||||||
|
ruler = EntityRuler(nlp)
|
||||||
|
patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
|
||||||
|
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
|
||||||
|
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
|
||||||
|
ruler.add_patterns(patterns)
|
||||||
|
nlp.add_pipe(ruler)
|
||||||
|
|
||||||
|
doc1 = nlp("Apple is opening its first big office in San Francisco.")
|
||||||
|
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
|
||||||
|
|
||||||
|
doc2 = nlp("Apple is opening its first big office in San Fran.")
|
||||||
|
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc2.ents])
|
||||||
|
```
|
||||||
|
|
||||||
|
If the `id` attribute is included in the [`EntityRuler`](/api/entityruler)
|
||||||
|
patterns, the `ent_id_` property of the matched entity is set to the `id` given
|
||||||
|
in the patterns. So in the example above it's easy to identify that "San
|
||||||
|
Francisco" and "San Fran" are both the same entity.
|
||||||
|
|
||||||
### Using pattern files {#entityruler-files}
|
### Using pattern files {#entityruler-files}
|
||||||
|
|
||||||
The [`to_disk`](/api/entityruler#to_disk) and
|
The [`to_disk`](/api/entityruler#to_disk) and
|
||||||
|
|
Loading…
Reference in New Issue
Block a user