WIP: update docs [ci skip]

2025-07-22 05:59:56 +03:00 · 2020-09-04 16:30:31 +02:00 · 2020-09-04 16:30:31 +02:00 · 157caf4dfa
commit 157caf4dfa
parent f174c7b1f3
5 changed files with 154 additions and 176 deletions
--- a/website/docs/api/dependencymatcher.md
+++ b/website/docs/api/dependencymatcher.md
@ -11,7 +11,8 @@ and [`PhraseMatcher`](/api/phrasematcher) and lets you match on dependency trees
 using
 [Semgrex operators](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html).
 It requires a pretrained [`DependencyParser`](/api/parser) or other component
-that sets the `Token.dep` and `Token.head` attributes.
+that sets the `Token.dep` and `Token.head` attributes. See the
 [usage guide](/usage/rule-based-matching#dependencymatcher) for examples.
 ## Pattern format {#patterns}
@ -48,63 +49,18 @@ dictionary, which defines an anchor token using only `RIGHT_ID` and
 | Name          | Description                                                                                                                                                            |
 | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node.                                                                             |
+| `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node. ~~str~~                                                                     |
 | `REL_OP`      | An operator that describes how the two nodes are related. ~~str~~                                                                                                      |
 | `RIGHT_ID`    | A unique name for the right-hand node in the relation. ~~str~~                                                                                                         |
 | `RIGHT_ATTRS` | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based [`Matcher`](/api/matcher). ~~Dict[str, Any]~~ |
-The first pattern defines an anchor token and each additional token added to the
+<Infobox title="Designing dependency matcher patterns" emoji="📖">
 pattern is linked to an existing token `LEFT_ID` by the relation `REL_OP` and is
 described by the name `RIGHT_ID` and the attributes `RIGHT_ATTRS`.
-Let's say we want to find sentences describing who founded what kind of company:
+For examples of how to construct dependency matcher patterns for different types
 of relations, see the usage guide on
 [dependency matching](/usage/rule-based-matching#dependencymatcher).
- `Smith founded a healthcare company in 2005.`
+</Infobox>
 - `Williams initially founded an insurance company in 1987.`
 - `Lee, an established CEO, founded yet another AI startup.`
 Since it's the root of the dependency parse, `founded` is a good choice for the
 anchor token in our pattern:
 ```python
 pattern = [
    {"RIGHT_ID": "anchor_founded", "RIGHT_ATTRS": {"ORTH": "founded"}}
 ]
 ```
 We can add the subject as the token with the dependency label `nsubj` that is a
 direct child `>` of the anchor token named `anchor_founded`:
 ```python
 pattern = [
    {"RIGHT_ID": "anchor_founded", "RIGHT_ATTRS": {"ORTH": "founded"}},
    {
        "LEFT_ID": "anchor_founded",
        "REL_OP": ">",
        "RIGHT_ID": "subject",
        "RIGHT_ATTRS": {"DEP": "nsubj"},
    }
 ]
 ```
 And the direct object along with its modifier:
 ```python
 pattern = [ ...
    {
        "LEFT_ID": "anchor_founded",
        "REL_OP": ">",
        "RIGHT_ID": "founded_object",
        "RIGHT_ATTRS": {"DEP": "dobj"},
    },
    {
        "LEFT_ID": "founded_object",
        "REL_OP": ">",
        "RIGHT_ID": "founded_object_modifier",
        "RIGHT_ATTRS": {"DEP": {"IN": ["amod", "compound"]}},
    }
 ]
 ```
 ### Operators
@ -113,19 +69,19 @@ come directly from
 [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html):
 | Symbol    | Description                                                                                                          |
-| --------- | ------------------------------------------------------------------------------------------------------------------- |
+| --------- | -------------------------------------------------------------------------------------------------------------------- |
-| `A < B`   | `A` is the immediate dependent of `B`                                                                               |
+| `A < B`   | `A` is the immediate dependent of `B`.                                                                               |
-| `A > B`   | `A` is the immediate head of `B`                                                                                    |
+| `A > B`   | `A` is the immediate head of `B`.                                                                                    |
-| `A << B`  | `A` is the dependent in a chain to `B` following dep->head paths                                                    |
+| `A << B`  | `A` is the dependent in a chain to `B` following dep &rarr; head paths.                                              |
-| `A >> B`  | `A` is the head in a chain to `B` following head->dep paths                                                         |
+| `A >> B`  | `A` is the head in a chain to `B` following head &rarr; dep paths.                                                   |
-| `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree                   |
+| `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree.                   |
-| `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_                 |
+| `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_.                 |
-| `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_ |
+| `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_. |
-| `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_                  |
+| `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_.                  |
-| `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`                 |
+| `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`.                 |
-| `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`                  |
+| `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`.                  |
-| `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`                                |
+| `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`.                                |
-| `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`                                 |
+| `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`.                                 |
 ## DependencyMatcher.\_\_init\_\_ {#init tag="method"}
--- a/website/docs/images/dep-match-diagram.svg
+++ b/website/docs/images/dep-match-diagram.svg
--- a/website/docs/images/displacy-dep-founded.html
+++ b/website/docs/images/displacy-dep-founded.html
@ -20,7 +20,7 @@
 </text>
 <text class="displacy-token" fill="currentColor" text-anchor="middle" y="309.5">
-    <tspan class="displacy-word" fill="currentColor" x="750">company.</tspan>
+    <tspan class="displacy-word" fill="currentColor" x="750">company</tspan>
    <tspan class="displacy-tag" dy="2em" fill="currentColor" x="750"></tspan>
 </text>
--- a/website/docs/usage/rule-based-matching.md
+++ b/website/docs/usage/rule-based-matching.md
@ -974,10 +974,12 @@ to match phrases with the same sequence of punctuation and non-punctuation
 tokens as the pattern. But this can easily get confusing and doesn't have much
 of an advantage over writing one or two token patterns.
-## Dependency Matcher {#dependencymatcher new="3"}
+## Dependency Matcher {#dependencymatcher new="3" model="parser"}
 The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within
-the dependency parse. It requires a model containing a parser such as the
+the dependency parse using
 [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html)
 operators. It requires a model containing a parser such as the
 [`DependencyParser`](/api/dependencyparser). Instead of defining a list of
 adjacent tokens as in `Matcher` patterns, the `DependencyMatcher` patterns match
 tokens in the dependency parse and specify the relations between them.
@ -1014,15 +1016,15 @@ tokens in the dependency parse and specify the relations between them.
 > matches = matcher(doc)
 > ```
-A pattern added to the `DependencyMatcher` consists of a list of dictionaries,
+A pattern added to the dependency matcher consists of a **list of
-with each dictionary describing a token to match and its relation to an existing
+dictionaries**, with each dictionary describing a **token to match** and its
-token in the pattern. Except for the first dictionary, which defines an anchor
+**relation to an existing token** in the pattern. Except for the first
-token using only `RIGHT_ID` and `RIGHT_ATTRS`, each pattern should have the
+dictionary, which defines an anchor token using only `RIGHT_ID` and
-following keys:
+`RIGHT_ATTRS`, each pattern should have the following keys:
 | Name          | Description                                                                                                                                                            |
 | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node.                                                                             |
+| `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node. ~~str~~                                                                     |
 | `REL_OP`      | An operator that describes how the two nodes are related. ~~str~~                                                                                                      |
 | `RIGHT_ID`    | A unique name for the right-hand node in the relation. ~~str~~                                                                                                         |
 | `RIGHT_ATTRS` | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based [`Matcher`](/api/matcher). ~~Dict[str, Any]~~ |
@ -1040,54 +1042,68 @@ can be used as `LEFT_ID` in another dict.
 </Infobox>
-### Dependency matcher operators
+### Dependency matcher operators {#dependencymatcher-operators}
 The following operators are supported by the `DependencyMatcher`, most of which
 come directly from
 [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html):
 | Symbol    | Description                                                                                                          |
-| --------- | ------------------------------------------------------------------------------------------------------------------- |
+| --------- | -------------------------------------------------------------------------------------------------------------------- |
-| `A < B`   | `A` is the immediate dependent of `B`                                                                               |
+| `A < B`   | `A` is the immediate dependent of `B`.                                                                               |
-| `A > B`   | `A` is the immediate head of `B`                                                                                    |
+| `A > B`   | `A` is the immediate head of `B`.                                                                                    |
-| `A << B`  | `A` is the dependent in a chain to `B` following dep->head paths                                                    |
+| `A << B`  | `A` is the dependent in a chain to `B` following dep &rarr; head paths.                                              |
-| `A >> B`  | `A` is the head in a chain to `B` following head->dep paths                                                         |
+| `A >> B`  | `A` is the head in a chain to `B` following head &rarr; dep paths.                                                   |
-| `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree                   |
+| `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree.                   |
-| `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_                 |
+| `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_.                 |
-| `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_ |
+| `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_. |
-| `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_                  |
+| `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_.                  |
-| `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`                 |
+| `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`.                 |
-| `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`                  |
+| `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`.                  |
-| `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`                                |
+| `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`.                                |
-| `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`                                 |
+| `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`.                                 |
-### Designing dependency matcher patterns
+### Designing dependency matcher patterns {#dependencymatcher-patterns}
 Let's say we want to find sentences describing who founded what kind of company:
- `Smith founded a healthcare company in 2005.`
+- _Smith founded a healthcare company in 2005._
- `Williams initially founded an insurance company in 1987.`
+- _Williams initially founded an insurance company in 1987._
- `Lee, an experienced CEO, has founded two AI startups.`
+- _Lee, an experienced CEO, has founded two AI startups._
-The dependency parse for `Smith founded a healthcare company` shows types of
+The dependency parse for "Smith founded a healthcare company" shows types of
 relations and tokens we want to match:
 > #### Visualizing the parse
 >
 > The [`displacy` visualizer](/usage/visualizer) lets you render `Doc` objects
 > and their dependency parse and part-of-speech tags:
 >
 > ```python
 > import spacy
 > from spacy import displacy
 >
 > nlp = spacy.load("en_core_web_sm")
 > doc = nlp("Smith founded a healthcare company")
 > displacy.serve(doc)
 > ```
 import DisplaCyDepFoundedHtml from 'images/displacy-dep-founded.html'
 <Iframe title="displaCy visualization of dependencies" html={DisplaCyDepFoundedHtml} height={450} />
 The relations we're interested in are:
- the founder is the subject (`nsubj`) of the token with the text `founded`
+- the founder is the **subject** (`nsubj`) of the token with the text `founded`
- the company is the object (`dobj`) of `founded`
+- the company is the **object** (`dobj`) of `founded`
- the kind of company may be an adjective (`amod`, not shown above) or a
+- the kind of company may be an **adjective** (`amod`, not shown above) or a
-  compound (`compound`)
+  **compound** (`compound`)
-The first step is to pick an anchor token for the pattern. Since it's the root
+The first step is to pick an **anchor token** for the pattern. Since it's the
-of the dependency parse, `founded` is a good choice here. It is often easier to
+root of the dependency parse, `founded` is a good choice here. It is often
-construct patterns when all dependency relation operators point from the head to
+easier to construct patterns when all dependency relation operators point from
-the children. In this example, we'll only use `>`, which connects a head to an
+the head to the children. In this example, we'll only use `>`, which connects a
-immediate dependent as `head > child`.
+head to an immediate dependent as `head > child`.
 The simplest dependency matcher pattern will identify and name a single token in
 the tree:
@ -1099,7 +1115,6 @@ from spacy.matcher import DependencyMatcher
 nlp = spacy.load("en_core_web_sm")
 matcher = DependencyMatcher(nlp.vocab)
 pattern = [
  {
    "RIGHT_ID": "anchor_founded",       # unique name
@ -1116,6 +1131,7 @@ Now that we have a named anchor token (`anchor_founded`), we can add the founder
 as the immediate dependent (`>`) of `founded` with the dependency label `nsubj`:
 ```python
 ### Step 1 {highlight="8,10"}
 pattern = [
    {
        "RIGHT_ID": "anchor_founded",
@ -1127,31 +1143,37 @@ pattern = [
        "RIGHT_ID": "subject",
        "RIGHT_ATTRS": {"DEP": "nsubj"},
    }
    # ...
 ]
 ```
 The direct object (`dobj`) is added in the same way:
 ```python
-pattern = [ ...
+### Step 2 {highlight=""}
 pattern = [
    #...
    {
        "LEFT_ID": "anchor_founded",
        "REL_OP": ">",
        "RIGHT_ID": "founded_object",
        "RIGHT_ATTRS": {"DEP": "dobj"},
    }
    # ...
 ]
 ```
 When the subject and object tokens are added, they are required to have names
 under the key `RIGHT_ID`, which are allowed to be any unique string, e.g.
-`founded_subject`. These names can then be used as `LEFT_ID` to link new tokens
+`founded_subject`. These names can then be used as `LEFT_ID` to **link new
-into the pattern. For the final part of our pattern, we'll specify that the
+tokens into the pattern**. For the final part of our pattern, we'll specify that
-token `founded_object` should have a modifier with the dependency relation
+the token `founded_object` should have a modifier with the dependency relation
 `amod` or `compound`:
 ```python
-pattern = [ ...
+### Step 3 {highlight="7"}
 pattern = [
    # ...
    {
        "LEFT_ID": "founded_object",
        "REL_OP": ">",
@ -1168,8 +1190,6 @@ each new token needs to be linked to an existing token on its left. As for
 `founded` in this example, a token may be linked to more than one token on its
 right:
 <!-- TODO: adjust for final example, prettify -->
 ![Dependency matcher pattern](../images/dep-match-diagram.svg)
 The full pattern comes together as shown in the example below:
@ -1209,11 +1229,10 @@ pattern = [
 matcher.add("FOUNDED", [pattern])
 doc = nlp("Lee, an experienced CEO, has founded two AI startups.")
 matches = matcher(doc)
 print(matches) # [(4851363122962674176, [6, 0, 10, 9])]
-# each token_id corresponds to one pattern dict
+print(matches) # [(4851363122962674176, [6, 0, 10, 9])]
 # Each token_id corresponds to one pattern dict
 match_id, token_ids = matches[0]
 for i in range(len(token_ids)):
    print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text)
--- a/website/docs/usage/v3.md
+++ b/website/docs/usage/v3.md
@ -26,6 +26,7 @@ menu:
 - [End-to-end project workflows](#features-projects)
 - [New built-in components](#features-pipeline-components)
 - [New custom component API](#features-components)
 - [Dependency matching](#features-dep-matcher)
 - [Python type hints](#features-types)
 - [New methods & attributes](#new-methods)
 - [New & updated documentation](#new-docs)
@ -152,7 +153,6 @@ add to your pipeline and customize for your use case:
 | [`Morphologizer`](/api/morphologizer)           | Trainable component to predict morphological features.                                                                                                                                                                  |
 | [`Lemmatizer`](/api/lemmatizer)                 | Standalone component for rule-based and lookup lemmatization.                                                                                                                                                           |
 | [`AttributeRuler`](/api/attributeruler)         | Component for setting token attributes using match patterns.                                                                                                                                                            |
 | [`DependencyMatcher`](/api/dependencymatcher)   | Component for matching subtrees within a dependency parse.                                                                                                                                                              |
 | [`Transformer`](/api/transformer)               | Component for using [transformer models](/usage/embeddings-transformers) in your pipeline, accessing outputs and aligning tokens. Provided via [`spacy-transformers`](https://github.com/explosion/spacy-transformers). |
 <Infobox title="Details & Documentation" emoji="📖" list>
@ -202,6 +202,34 @@ aren't set.
 </Infobox>
 ### Dependency matching {#features-dep-matcher}
 <!-- TODO: improve summary -->
 > #### Example
 >
 > ```python
 > # TODO: example
 > ```
 The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within
 the dependency parse using
 [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html)
 operators. It follows the same API as the token-based [`Matcher`](/api/matcher).
 A pattern added to the dependency matcher consists of a **list of
 dictionaries**, with each dictionary describing a **token to match** and its
 **relation to an existing token** in the pattern.
 <Infobox title="Details & Documentation" emoji="📖" list>
 - **Usage:**
  [Dependency matching](/usage/rule-based-matching#dependencymatcher),
 - **API:** [`DependencyMatcher`](/api/dependencymatcher),
 - **Implementation:**
  [`spacy/matcher/dependencymatcher.pyx`](https://github.com/explosion/spaCy/tree/develop/spacy/matcher/dependencymatcher.pyx)
 </Infobox>
 ### Type hints and type-based data validation {#features-types}
 > #### Example