mirror of
https://github.com/explosion/spaCy.git
synced 2024-12-26 01:46:28 +03:00
Adjust order of docs sections [ci skip]
This commit is contained in:
parent
74b951fe61
commit
5adcb352e9
|
@ -986,37 +986,6 @@ doc = nlp("Apple is opening its first big office in San Francisco.")
|
|||
print([(ent.text, ent.label_) for ent in doc.ents])
|
||||
```
|
||||
|
||||
### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
|
||||
|
||||
The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
|
||||
pattern. Using the `id` attribute allows multiple patterns to be associated with
|
||||
the same entity.
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
from spacy.lang.en import English
|
||||
from spacy.pipeline import EntityRuler
|
||||
|
||||
nlp = English()
|
||||
ruler = EntityRuler(nlp)
|
||||
patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
|
||||
ruler.add_patterns(patterns)
|
||||
nlp.add_pipe(ruler)
|
||||
|
||||
doc1 = nlp("Apple is opening its first big office in San Francisco.")
|
||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
|
||||
|
||||
doc2 = nlp("Apple is opening its first big office in San Fran.")
|
||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc2.ents])
|
||||
```
|
||||
|
||||
If the `id` attribute is included in the [`EntityRuler`](/api/entityruler)
|
||||
patterns, the `ent_id_` property of the matched entity is set to the `id` given
|
||||
in the patterns. So in the example above it's easy to identify that "San
|
||||
Francisco" and "San Fran" are both the same entity.
|
||||
|
||||
The entity ruler is designed to integrate with spaCy's existing statistical
|
||||
models and enhance the named entity recognizer. If it's added **before the
|
||||
`"ner"` component**, the entity recognizer will respect the existing entity
|
||||
|
@ -1051,6 +1020,37 @@ The `EntityRuler` can validate patterns against a JSON schema with the option
|
|||
ruler = EntityRuler(nlp, validate=True)
|
||||
```
|
||||
|
||||
### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
|
||||
|
||||
The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
|
||||
pattern. Using the `id` attribute allows multiple patterns to be associated with
|
||||
the same entity.
|
||||
|
||||
```python
|
||||
### {executable="true"}
|
||||
from spacy.lang.en import English
|
||||
from spacy.pipeline import EntityRuler
|
||||
|
||||
nlp = English()
|
||||
ruler = EntityRuler(nlp)
|
||||
patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
|
||||
{"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
|
||||
ruler.add_patterns(patterns)
|
||||
nlp.add_pipe(ruler)
|
||||
|
||||
doc1 = nlp("Apple is opening its first big office in San Francisco.")
|
||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
|
||||
|
||||
doc2 = nlp("Apple is opening its first big office in San Fran.")
|
||||
print([(ent.text, ent.label_, ent.ent_id_) for ent in doc2.ents])
|
||||
```
|
||||
|
||||
If the `id` attribute is included in the [`EntityRuler`](/api/entityruler)
|
||||
patterns, the `ent_id_` property of the matched entity is set to the `id` given
|
||||
in the patterns. So in the example above it's easy to identify that "San
|
||||
Francisco" and "San Fran" are both the same entity.
|
||||
|
||||
### Using pattern files {#entityruler-files}
|
||||
|
||||
The [`to_disk`](/api/entityruler#to_disk) and
|
||||
|
|
Loading…
Reference in New Issue
Block a user