Merge remote-tracking branch 'origin/master' into feature/improve-pretrain

2025-12-14 05:34:16 +03:00 · 2019-11-17 17:19:34 +01:00 · 2019-11-17 17:19:34 +01:00 · 794870b5a5
commit 794870b5a5
parent a48e69d8d1 5adcb352e9
1 changed files with 31 additions and 31 deletions
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -986,37 +986,6 @@ doc = nlp("Apple is opening its first big office in San Francisco.")
 print([(ent.text, ent.label_) for ent in doc.ents])
 ```
 ### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
 The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
 pattern. Using the `id` attribute allows multiple patterns to be associated with
 the same entity.
 ```python
 ### {executable="true"}
 from spacy.lang.en import English
 from spacy.pipeline import EntityRuler
 nlp = English()
 ruler = EntityRuler(nlp)
 patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
            {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
            {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
 ruler.add_patterns(patterns)
 nlp.add_pipe(ruler)
 doc1 = nlp("Apple is opening its first big office in San Francisco.")
 print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
 doc2 = nlp("Apple is opening its first big office in San Fran.")
 print([(ent.text, ent.label_, ent.ent_id_) for ent in doc2.ents])
 ```
 If the `id` attribute is included in the [`EntityRuler`](/api/entityruler)
 patterns, the `ent_id_` property of the matched entity is set to the `id` given
 in the patterns. So in the example above it's easy to identify that "San
 Francisco" and "San Fran" are both the same entity.
 The entity ruler is designed to integrate with spaCy's existing statistical
 models and enhance the named entity recognizer. If it's added **before the
 `"ner"` component**, the entity recognizer will respect the existing entity
@ -1051,6 +1020,37 @@ The `EntityRuler` can validate patterns against a JSON schema with the option
 ruler = EntityRuler(nlp, validate=True)
 ```
 ### Adding IDs to patterns {#entityruler-ent-ids new="2.2.2"}
 The [`EntityRuler`](/api/entityruler) can also accept an `id` attribute for each
 pattern. Using the `id` attribute allows multiple patterns to be associated with
 the same entity.
 ```python
 ### {executable="true"}
 from spacy.lang.en import English
 from spacy.pipeline import EntityRuler
 nlp = English()
 ruler = EntityRuler(nlp)
 patterns = [{"label": "ORG", "pattern": "Apple", "id": "apple"},
            {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}], "id": "san-francisco"},
            {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "fran"}], "id": "san-francisco"}]
 ruler.add_patterns(patterns)
 nlp.add_pipe(ruler)
 doc1 = nlp("Apple is opening its first big office in San Francisco.")
 print([(ent.text, ent.label_, ent.ent_id_) for ent in doc1.ents])
 doc2 = nlp("Apple is opening its first big office in San Fran.")
 print([(ent.text, ent.label_, ent.ent_id_) for ent in doc2.ents])
 ```
 If the `id` attribute is included in the [`EntityRuler`](/api/entityruler)
 patterns, the `ent_id_` property of the matched entity is set to the `id` given
 in the patterns. So in the example above it's easy to identify that "San
 Francisco" and "San Fran" are both the same entity.
 ### Using pattern files {#entityruler-files}
 The [`to_disk`](/api/entityruler#to_disk) and