mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 16:07:41 +03:00 
			
		
		
		
	WIP: update docs [ci skip]
This commit is contained in:
		
							parent
							
								
									f174c7b1f3
								
							
						
					
					
						commit
						157caf4dfa
					
				|  | @ -11,7 +11,8 @@ and [`PhraseMatcher`](/api/phrasematcher) and lets you match on dependency trees | ||||||
| using | using | ||||||
| [Semgrex operators](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html). | [Semgrex operators](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html). | ||||||
| It requires a pretrained [`DependencyParser`](/api/parser) or other component | It requires a pretrained [`DependencyParser`](/api/parser) or other component | ||||||
| that sets the `Token.dep` and `Token.head` attributes. | that sets the `Token.dep` and `Token.head` attributes. See the | ||||||
|  | [usage guide](/usage/rule-based-matching#dependencymatcher) for examples. | ||||||
| 
 | 
 | ||||||
| ## Pattern format {#patterns} | ## Pattern format {#patterns} | ||||||
| 
 | 
 | ||||||
|  | @ -48,63 +49,18 @@ dictionary, which defines an anchor token using only `RIGHT_ID` and | ||||||
| 
 | 
 | ||||||
| | Name          | Description                                                                                                                                                            | | | Name          | Description                                                                                                                                                            | | ||||||
| | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node.                                                                             | | | `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node. ~~str~~                                                                     | | ||||||
| | `REL_OP`      | An operator that describes how the two nodes are related. ~~str~~                                                                                                      | | | `REL_OP`      | An operator that describes how the two nodes are related. ~~str~~                                                                                                      | | ||||||
| | `RIGHT_ID`    | A unique name for the right-hand node in the relation. ~~str~~                                                                                                         | | | `RIGHT_ID`    | A unique name for the right-hand node in the relation. ~~str~~                                                                                                         | | ||||||
| | `RIGHT_ATTRS` | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based [`Matcher`](/api/matcher). ~~Dict[str, Any]~~ | | | `RIGHT_ATTRS` | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based [`Matcher`](/api/matcher). ~~Dict[str, Any]~~ | | ||||||
| 
 | 
 | ||||||
| The first pattern defines an anchor token and each additional token added to the | <Infobox title="Designing dependency matcher patterns" emoji="📖"> | ||||||
| pattern is linked to an existing token `LEFT_ID` by the relation `REL_OP` and is |  | ||||||
| described by the name `RIGHT_ID` and the attributes `RIGHT_ATTRS`. |  | ||||||
| 
 | 
 | ||||||
| Let's say we want to find sentences describing who founded what kind of company: | For examples of how to construct dependency matcher patterns for different types | ||||||
|  | of relations, see the usage guide on | ||||||
|  | [dependency matching](/usage/rule-based-matching#dependencymatcher). | ||||||
| 
 | 
 | ||||||
| - `Smith founded a healthcare company in 2005.` | </Infobox> | ||||||
| - `Williams initially founded an insurance company in 1987.` |  | ||||||
| - `Lee, an established CEO, founded yet another AI startup.` |  | ||||||
| 
 |  | ||||||
| Since it's the root of the dependency parse, `founded` is a good choice for the |  | ||||||
| anchor token in our pattern: |  | ||||||
| 
 |  | ||||||
| ```python |  | ||||||
| pattern = [ |  | ||||||
|     {"RIGHT_ID": "anchor_founded", "RIGHT_ATTRS": {"ORTH": "founded"}} |  | ||||||
| ] |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| We can add the subject as the token with the dependency label `nsubj` that is a |  | ||||||
| direct child `>` of the anchor token named `anchor_founded`: |  | ||||||
| 
 |  | ||||||
| ```python |  | ||||||
| pattern = [ |  | ||||||
|     {"RIGHT_ID": "anchor_founded", "RIGHT_ATTRS": {"ORTH": "founded"}}, |  | ||||||
|     { |  | ||||||
|         "LEFT_ID": "anchor_founded", |  | ||||||
|         "REL_OP": ">", |  | ||||||
|         "RIGHT_ID": "subject", |  | ||||||
|         "RIGHT_ATTRS": {"DEP": "nsubj"}, |  | ||||||
|     } |  | ||||||
| ] |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| And the direct object along with its modifier: |  | ||||||
| 
 |  | ||||||
| ```python |  | ||||||
| pattern = [ ... |  | ||||||
|     { |  | ||||||
|         "LEFT_ID": "anchor_founded", |  | ||||||
|         "REL_OP": ">", |  | ||||||
|         "RIGHT_ID": "founded_object", |  | ||||||
|         "RIGHT_ATTRS": {"DEP": "dobj"}, |  | ||||||
|     }, |  | ||||||
|     { |  | ||||||
|         "LEFT_ID": "founded_object", |  | ||||||
|         "REL_OP": ">", |  | ||||||
|         "RIGHT_ID": "founded_object_modifier", |  | ||||||
|         "RIGHT_ATTRS": {"DEP": {"IN": ["amod", "compound"]}}, |  | ||||||
|     } |  | ||||||
| ] |  | ||||||
| ``` |  | ||||||
| 
 | 
 | ||||||
| ### Operators | ### Operators | ||||||
| 
 | 
 | ||||||
|  | @ -113,19 +69,19 @@ come directly from | ||||||
| [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html): | [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html): | ||||||
| 
 | 
 | ||||||
| | Symbol    | Description                                                                                                          | | | Symbol    | Description                                                                                                          | | ||||||
| | --------- | ------------------------------------------------------------------------------------------------------------------- | | | --------- | -------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `A < B`   | `A` is the immediate dependent of `B`                                                                               | | | `A < B`   | `A` is the immediate dependent of `B`.                                                                               | | ||||||
| | `A > B`   | `A` is the immediate head of `B`                                                                                    | | | `A > B`   | `A` is the immediate head of `B`.                                                                                    | | ||||||
| | `A << B`  | `A` is the dependent in a chain to `B` following dep->head paths                                                    | | | `A << B`  | `A` is the dependent in a chain to `B` following dep → head paths.                                              | | ||||||
| | `A >> B`  | `A` is the head in a chain to `B` following head->dep paths                                                         | | | `A >> B`  | `A` is the head in a chain to `B` following head → dep paths.                                                   | | ||||||
| | `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree                   | | | `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree.                   | | ||||||
| | `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_                 | | | `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_.                 | | ||||||
| | `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_ | | | `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_. | | ||||||
| | `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_                  | | | `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_.                  | | ||||||
| | `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`                 | | | `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`.                 | | ||||||
| | `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`                  | | | `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`.                  | | ||||||
| | `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`                                | | | `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`.                                | | ||||||
| | `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`                                 | | | `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`.                                 | | ||||||
| 
 | 
 | ||||||
| ## DependencyMatcher.\_\_init\_\_ {#init tag="method"} | ## DependencyMatcher.\_\_init\_\_ {#init tag="method"} | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
										
											
												File diff suppressed because one or more lines are too long
											
										
									
								
							| Before Width: | Height: | Size: 4.6 KiB After Width: | Height: | Size: 25 KiB | 
|  | @ -20,7 +20,7 @@ | ||||||
| </text> | </text> | ||||||
| 
 | 
 | ||||||
| <text class="displacy-token" fill="currentColor" text-anchor="middle" y="309.5"> | <text class="displacy-token" fill="currentColor" text-anchor="middle" y="309.5"> | ||||||
|     <tspan class="displacy-word" fill="currentColor" x="750">company.</tspan> |     <tspan class="displacy-word" fill="currentColor" x="750">company</tspan> | ||||||
|     <tspan class="displacy-tag" dy="2em" fill="currentColor" x="750"></tspan> |     <tspan class="displacy-tag" dy="2em" fill="currentColor" x="750"></tspan> | ||||||
| </text> | </text> | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
| Before Width: | Height: | Size: 3.8 KiB After Width: | Height: | Size: 3.8 KiB | 
|  | @ -974,10 +974,12 @@ to match phrases with the same sequence of punctuation and non-punctuation | ||||||
| tokens as the pattern. But this can easily get confusing and doesn't have much | tokens as the pattern. But this can easily get confusing and doesn't have much | ||||||
| of an advantage over writing one or two token patterns. | of an advantage over writing one or two token patterns. | ||||||
| 
 | 
 | ||||||
| ## Dependency Matcher {#dependencymatcher new="3"} | ## Dependency Matcher {#dependencymatcher new="3" model="parser"} | ||||||
| 
 | 
 | ||||||
| The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within | The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within | ||||||
| the dependency parse. It requires a model containing a parser such as the | the dependency parse using | ||||||
|  | [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html) | ||||||
|  | operators. It requires a model containing a parser such as the | ||||||
| [`DependencyParser`](/api/dependencyparser). Instead of defining a list of | [`DependencyParser`](/api/dependencyparser). Instead of defining a list of | ||||||
| adjacent tokens as in `Matcher` patterns, the `DependencyMatcher` patterns match | adjacent tokens as in `Matcher` patterns, the `DependencyMatcher` patterns match | ||||||
| tokens in the dependency parse and specify the relations between them. | tokens in the dependency parse and specify the relations between them. | ||||||
|  | @ -1014,15 +1016,15 @@ tokens in the dependency parse and specify the relations between them. | ||||||
| > matches = matcher(doc) | > matches = matcher(doc) | ||||||
| > ``` | > ``` | ||||||
| 
 | 
 | ||||||
| A pattern added to the `DependencyMatcher` consists of a list of dictionaries, | A pattern added to the dependency matcher consists of a **list of | ||||||
| with each dictionary describing a token to match and its relation to an existing | dictionaries**, with each dictionary describing a **token to match** and its | ||||||
| token in the pattern. Except for the first dictionary, which defines an anchor | **relation to an existing token** in the pattern. Except for the first | ||||||
| token using only `RIGHT_ID` and `RIGHT_ATTRS`, each pattern should have the | dictionary, which defines an anchor token using only `RIGHT_ID` and | ||||||
| following keys: | `RIGHT_ATTRS`, each pattern should have the following keys: | ||||||
| 
 | 
 | ||||||
| | Name          | Description                                                                                                                                                            | | | Name          | Description                                                                                                                                                            | | ||||||
| | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node.                                                                             | | | `LEFT_ID`     | The name of the left-hand node in the relation, which has been defined in an earlier node. ~~str~~                                                                     | | ||||||
| | `REL_OP`      | An operator that describes how the two nodes are related. ~~str~~                                                                                                      | | | `REL_OP`      | An operator that describes how the two nodes are related. ~~str~~                                                                                                      | | ||||||
| | `RIGHT_ID`    | A unique name for the right-hand node in the relation. ~~str~~                                                                                                         | | | `RIGHT_ID`    | A unique name for the right-hand node in the relation. ~~str~~                                                                                                         | | ||||||
| | `RIGHT_ATTRS` | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based [`Matcher`](/api/matcher). ~~Dict[str, Any]~~ | | | `RIGHT_ATTRS` | The token attributes to match for the right-hand node in the same format as patterns provided to the regular token-based [`Matcher`](/api/matcher). ~~Dict[str, Any]~~ | | ||||||
|  | @ -1040,54 +1042,68 @@ can be used as `LEFT_ID` in another dict. | ||||||
| 
 | 
 | ||||||
| </Infobox> | </Infobox> | ||||||
| 
 | 
 | ||||||
| ### Dependency matcher operators | ### Dependency matcher operators {#dependencymatcher-operators} | ||||||
| 
 | 
 | ||||||
| The following operators are supported by the `DependencyMatcher`, most of which | The following operators are supported by the `DependencyMatcher`, most of which | ||||||
| come directly from | come directly from | ||||||
| [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html): | [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html): | ||||||
| 
 | 
 | ||||||
| | Symbol    | Description                                                                                                          | | | Symbol    | Description                                                                                                          | | ||||||
| | --------- | ------------------------------------------------------------------------------------------------------------------- | | | --------- | -------------------------------------------------------------------------------------------------------------------- | | ||||||
| | `A < B`   | `A` is the immediate dependent of `B`                                                                               | | | `A < B`   | `A` is the immediate dependent of `B`.                                                                               | | ||||||
| | `A > B`   | `A` is the immediate head of `B`                                                                                    | | | `A > B`   | `A` is the immediate head of `B`.                                                                                    | | ||||||
| | `A << B`  | `A` is the dependent in a chain to `B` following dep->head paths                                                    | | | `A << B`  | `A` is the dependent in a chain to `B` following dep → head paths.                                              | | ||||||
| | `A >> B`  | `A` is the head in a chain to `B` following head->dep paths                                                         | | | `A >> B`  | `A` is the head in a chain to `B` following head → dep paths.                                                   | | ||||||
| | `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree                   | | | `A . B`   | `A` immediately precedes `B`, i.e. `A.i == B.i - 1`, and both are within the same dependency tree.                   | | ||||||
| | `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_                 | | | `A .* B`  | `A` precedes `B`, i.e. `A.i < B.i`, and both are within the same dependency tree _(not in Semgrex)_.                 | | ||||||
| | `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_ | | | `A ; B`   | `A` immediately follows `B`, i.e. `A.i == B.i + 1`, and both are within the same dependency tree _(not in Semgrex)_. | | ||||||
| | `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_                  | | | `A ;* B`  | `A` follows `B`, i.e. `A.i > B.i`, and both are within the same dependency tree _(not in Semgrex)_.                  | | ||||||
| | `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`                 | | | `A $+ B`  | `B` is a right immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i - 1`.                 | | ||||||
| | `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`                  | | | `A $- B`  | `B` is a left immediate sibling of `A`, i.e. `A` and `B` have the same parent and `A.i == B.i + 1`.                  | | ||||||
| | `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`                                | | | `A $++ B` | `B` is a right sibling of `A`, i.e. `A` and `B` have the same parent and `A.i < B.i`.                                | | ||||||
| | `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`                                 | | | `A $-- B` | `B` is a left sibling of `A`, i.e. `A` and `B` have the same parent and `A.i > B.i`.                                 | | ||||||
| 
 | 
 | ||||||
| ### Designing dependency matcher patterns | ### Designing dependency matcher patterns {#dependencymatcher-patterns} | ||||||
| 
 | 
 | ||||||
| Let's say we want to find sentences describing who founded what kind of company: | Let's say we want to find sentences describing who founded what kind of company: | ||||||
| 
 | 
 | ||||||
| - `Smith founded a healthcare company in 2005.` | - _Smith founded a healthcare company in 2005._ | ||||||
| - `Williams initially founded an insurance company in 1987.` | - _Williams initially founded an insurance company in 1987._ | ||||||
| - `Lee, an experienced CEO, has founded two AI startups.` | - _Lee, an experienced CEO, has founded two AI startups._ | ||||||
| 
 | 
 | ||||||
| The dependency parse for `Smith founded a healthcare company` shows types of | The dependency parse for "Smith founded a healthcare company" shows types of | ||||||
| relations and tokens we want to match: | relations and tokens we want to match: | ||||||
| 
 | 
 | ||||||
|  | > #### Visualizing the parse | ||||||
|  | > | ||||||
|  | > The [`displacy` visualizer](/usage/visualizer) lets you render `Doc` objects | ||||||
|  | > and their dependency parse and part-of-speech tags: | ||||||
|  | > | ||||||
|  | > ```python | ||||||
|  | > import spacy | ||||||
|  | > from spacy import displacy | ||||||
|  | > | ||||||
|  | > nlp = spacy.load("en_core_web_sm") | ||||||
|  | > doc = nlp("Smith founded a healthcare company") | ||||||
|  | > displacy.serve(doc) | ||||||
|  | > ``` | ||||||
|  | 
 | ||||||
| import DisplaCyDepFoundedHtml from 'images/displacy-dep-founded.html' | import DisplaCyDepFoundedHtml from 'images/displacy-dep-founded.html' | ||||||
| 
 | 
 | ||||||
| <Iframe title="displaCy visualization of dependencies" html={DisplaCyDepFoundedHtml} height={450} /> | <Iframe title="displaCy visualization of dependencies" html={DisplaCyDepFoundedHtml} height={450} /> | ||||||
| 
 | 
 | ||||||
| The relations we're interested in are: | The relations we're interested in are: | ||||||
| 
 | 
 | ||||||
| - the founder is the subject (`nsubj`) of the token with the text `founded` | - the founder is the **subject** (`nsubj`) of the token with the text `founded` | ||||||
| - the company is the object (`dobj`) of `founded` | - the company is the **object** (`dobj`) of `founded` | ||||||
| - the kind of company may be an adjective (`amod`, not shown above) or a | - the kind of company may be an **adjective** (`amod`, not shown above) or a | ||||||
|   compound (`compound`) |   **compound** (`compound`) | ||||||
| 
 | 
 | ||||||
| The first step is to pick an anchor token for the pattern. Since it's the root | The first step is to pick an **anchor token** for the pattern. Since it's the | ||||||
| of the dependency parse, `founded` is a good choice here. It is often easier to | root of the dependency parse, `founded` is a good choice here. It is often | ||||||
| construct patterns when all dependency relation operators point from the head to | easier to construct patterns when all dependency relation operators point from | ||||||
| the children. In this example, we'll only use `>`, which connects a head to an | the head to the children. In this example, we'll only use `>`, which connects a | ||||||
| immediate dependent as `head > child`. | head to an immediate dependent as `head > child`. | ||||||
| 
 | 
 | ||||||
| The simplest dependency matcher pattern will identify and name a single token in | The simplest dependency matcher pattern will identify and name a single token in | ||||||
| the tree: | the tree: | ||||||
|  | @ -1099,7 +1115,6 @@ from spacy.matcher import DependencyMatcher | ||||||
| 
 | 
 | ||||||
| nlp = spacy.load("en_core_web_sm") | nlp = spacy.load("en_core_web_sm") | ||||||
| matcher = DependencyMatcher(nlp.vocab) | matcher = DependencyMatcher(nlp.vocab) | ||||||
| 
 |  | ||||||
| pattern = [ | pattern = [ | ||||||
|   { |   { | ||||||
|     "RIGHT_ID": "anchor_founded",       # unique name |     "RIGHT_ID": "anchor_founded",       # unique name | ||||||
|  | @ -1116,6 +1131,7 @@ Now that we have a named anchor token (`anchor_founded`), we can add the founder | ||||||
| as the immediate dependent (`>`) of `founded` with the dependency label `nsubj`: | as the immediate dependent (`>`) of `founded` with the dependency label `nsubj`: | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
|  | ### Step 1 {highlight="8,10"} | ||||||
| pattern = [ | pattern = [ | ||||||
|     { |     { | ||||||
|         "RIGHT_ID": "anchor_founded", |         "RIGHT_ID": "anchor_founded", | ||||||
|  | @ -1127,31 +1143,37 @@ pattern = [ | ||||||
|         "RIGHT_ID": "subject", |         "RIGHT_ID": "subject", | ||||||
|         "RIGHT_ATTRS": {"DEP": "nsubj"}, |         "RIGHT_ATTRS": {"DEP": "nsubj"}, | ||||||
|     } |     } | ||||||
|  |     # ... | ||||||
| ] | ] | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| The direct object (`dobj`) is added in the same way: | The direct object (`dobj`) is added in the same way: | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| pattern = [ ... | ### Step 2 {highlight=""} | ||||||
|  | pattern = [ | ||||||
|  |     #... | ||||||
|     { |     { | ||||||
|         "LEFT_ID": "anchor_founded", |         "LEFT_ID": "anchor_founded", | ||||||
|         "REL_OP": ">", |         "REL_OP": ">", | ||||||
|         "RIGHT_ID": "founded_object", |         "RIGHT_ID": "founded_object", | ||||||
|         "RIGHT_ATTRS": {"DEP": "dobj"}, |         "RIGHT_ATTRS": {"DEP": "dobj"}, | ||||||
|     } |     } | ||||||
|  |     # ... | ||||||
| ] | ] | ||||||
| ``` | ``` | ||||||
| 
 | 
 | ||||||
| When the subject and object tokens are added, they are required to have names | When the subject and object tokens are added, they are required to have names | ||||||
| under the key `RIGHT_ID`, which are allowed to be any unique string, e.g. | under the key `RIGHT_ID`, which are allowed to be any unique string, e.g. | ||||||
| `founded_subject`. These names can then be used as `LEFT_ID` to link new tokens | `founded_subject`. These names can then be used as `LEFT_ID` to **link new | ||||||
| into the pattern. For the final part of our pattern, we'll specify that the | tokens into the pattern**. For the final part of our pattern, we'll specify that | ||||||
| token `founded_object` should have a modifier with the dependency relation | the token `founded_object` should have a modifier with the dependency relation | ||||||
| `amod` or `compound`: | `amod` or `compound`: | ||||||
| 
 | 
 | ||||||
| ```python | ```python | ||||||
| pattern = [ ... | ### Step 3 {highlight="7"} | ||||||
|  | pattern = [ | ||||||
|  |     # ... | ||||||
|     { |     { | ||||||
|         "LEFT_ID": "founded_object", |         "LEFT_ID": "founded_object", | ||||||
|         "REL_OP": ">", |         "REL_OP": ">", | ||||||
|  | @ -1168,8 +1190,6 @@ each new token needs to be linked to an existing token on its left. As for | ||||||
| `founded` in this example, a token may be linked to more than one token on its | `founded` in this example, a token may be linked to more than one token on its | ||||||
| right: | right: | ||||||
| 
 | 
 | ||||||
| <!-- TODO: adjust for final example, prettify --> |  | ||||||
| 
 |  | ||||||
|  |  | ||||||
| 
 | 
 | ||||||
| The full pattern comes together as shown in the example below: | The full pattern comes together as shown in the example below: | ||||||
|  | @ -1209,11 +1229,10 @@ pattern = [ | ||||||
| 
 | 
 | ||||||
| matcher.add("FOUNDED", [pattern]) | matcher.add("FOUNDED", [pattern]) | ||||||
| doc = nlp("Lee, an experienced CEO, has founded two AI startups.") | doc = nlp("Lee, an experienced CEO, has founded two AI startups.") | ||||||
| 
 |  | ||||||
| matches = matcher(doc) | matches = matcher(doc) | ||||||
| print(matches) # [(4851363122962674176, [6, 0, 10, 9])] |  | ||||||
| 
 | 
 | ||||||
| # each token_id corresponds to one pattern dict | print(matches) # [(4851363122962674176, [6, 0, 10, 9])] | ||||||
|  | # Each token_id corresponds to one pattern dict | ||||||
| match_id, token_ids = matches[0] | match_id, token_ids = matches[0] | ||||||
| for i in range(len(token_ids)): | for i in range(len(token_ids)): | ||||||
|     print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text) |     print(pattern[i]["RIGHT_ID"] + ":", doc[token_ids[i]].text) | ||||||
|  |  | ||||||
|  | @ -26,6 +26,7 @@ menu: | ||||||
| - [End-to-end project workflows](#features-projects) | - [End-to-end project workflows](#features-projects) | ||||||
| - [New built-in components](#features-pipeline-components) | - [New built-in components](#features-pipeline-components) | ||||||
| - [New custom component API](#features-components) | - [New custom component API](#features-components) | ||||||
|  | - [Dependency matching](#features-dep-matcher) | ||||||
| - [Python type hints](#features-types) | - [Python type hints](#features-types) | ||||||
| - [New methods & attributes](#new-methods) | - [New methods & attributes](#new-methods) | ||||||
| - [New & updated documentation](#new-docs) | - [New & updated documentation](#new-docs) | ||||||
|  | @ -152,7 +153,6 @@ add to your pipeline and customize for your use case: | ||||||
| | [`Morphologizer`](/api/morphologizer)           | Trainable component to predict morphological features.                                                                                                                                                                  | | | [`Morphologizer`](/api/morphologizer)           | Trainable component to predict morphological features.                                                                                                                                                                  | | ||||||
| | [`Lemmatizer`](/api/lemmatizer)                 | Standalone component for rule-based and lookup lemmatization.                                                                                                                                                           | | | [`Lemmatizer`](/api/lemmatizer)                 | Standalone component for rule-based and lookup lemmatization.                                                                                                                                                           | | ||||||
| | [`AttributeRuler`](/api/attributeruler)         | Component for setting token attributes using match patterns.                                                                                                                                                            | | | [`AttributeRuler`](/api/attributeruler)         | Component for setting token attributes using match patterns.                                                                                                                                                            | | ||||||
| | [`DependencyMatcher`](/api/dependencymatcher)   | Component for matching subtrees within a dependency parse.                                                                                                                                                              | |  | ||||||
| | [`Transformer`](/api/transformer)               | Component for using [transformer models](/usage/embeddings-transformers) in your pipeline, accessing outputs and aligning tokens. Provided via [`spacy-transformers`](https://github.com/explosion/spacy-transformers). | | | [`Transformer`](/api/transformer)               | Component for using [transformer models](/usage/embeddings-transformers) in your pipeline, accessing outputs and aligning tokens. Provided via [`spacy-transformers`](https://github.com/explosion/spacy-transformers). | | ||||||
| 
 | 
 | ||||||
| <Infobox title="Details & Documentation" emoji="📖" list> | <Infobox title="Details & Documentation" emoji="📖" list> | ||||||
|  | @ -202,6 +202,34 @@ aren't set. | ||||||
| 
 | 
 | ||||||
| </Infobox> | </Infobox> | ||||||
| 
 | 
 | ||||||
|  | ### Dependency matching {#features-dep-matcher} | ||||||
|  | 
 | ||||||
|  | <!-- TODO: improve summary --> | ||||||
|  | 
 | ||||||
|  | > #### Example | ||||||
|  | > | ||||||
|  | > ```python | ||||||
|  | > # TODO: example | ||||||
|  | > ``` | ||||||
|  | 
 | ||||||
|  | The [`DependencyMatcher`](/api/dependencymatcher) lets you match patterns within | ||||||
|  | the dependency parse using | ||||||
|  | [Semgrex](https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html) | ||||||
|  | operators. It follows the same API as the token-based [`Matcher`](/api/matcher). | ||||||
|  | A pattern added to the dependency matcher consists of a **list of | ||||||
|  | dictionaries**, with each dictionary describing a **token to match** and its | ||||||
|  | **relation to an existing token** in the pattern. | ||||||
|  | 
 | ||||||
|  | <Infobox title="Details & Documentation" emoji="📖" list> | ||||||
|  | 
 | ||||||
|  | - **Usage:** | ||||||
|  |   [Dependency matching](/usage/rule-based-matching#dependencymatcher), | ||||||
|  | - **API:** [`DependencyMatcher`](/api/dependencymatcher), | ||||||
|  | - **Implementation:** | ||||||
|  |   [`spacy/matcher/dependencymatcher.pyx`](https://github.com/explosion/spaCy/tree/develop/spacy/matcher/dependencymatcher.pyx) | ||||||
|  | 
 | ||||||
|  | </Infobox> | ||||||
|  | 
 | ||||||
| ### Type hints and type-based data validation {#features-types} | ### Type hints and type-based data validation {#features-types} | ||||||
| 
 | 
 | ||||||
| > #### Example | > #### Example | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user