mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 09:57:26 +03:00 
			
		
		
		
	Update docs [ci skip]
This commit is contained in:
		
							parent
							
								
									dd84577a98
								
							
						
					
					
						commit
						f31c4462ca
					
				| 
						 | 
					@ -82,6 +82,14 @@ check whether a [`Doc`](/api/doc) object has been parsed with the
 | 
				
			||||||
`doc.is_parsed` attribute, which returns a boolean value. If this attribute is
 | 
					`doc.is_parsed` attribute, which returns a boolean value. If this attribute is
 | 
				
			||||||
`False`, the default sentence iterator will raise an exception.
 | 
					`False`, the default sentence iterator will raise an exception.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<Infobox title="Dependency label scheme" emoji="📖">
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For a list of the syntactic dependency labels assigned by spaCy's models across
 | 
				
			||||||
 | 
					different languages, see the label schemes documented in the
 | 
				
			||||||
 | 
					[models directory](/models).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					</Infobox>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Noun chunks {#noun-chunks}
 | 
					### Noun chunks {#noun-chunks}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Noun chunks are "base noun phrases" – flat phrases that have a noun as their
 | 
					Noun chunks are "base noun phrases" – flat phrases that have a noun as their
 | 
				
			||||||
| 
						 | 
					@ -288,11 +296,45 @@ for token in doc:
 | 
				
			||||||
| their                               | `ADJ`  | `poss`  | requests  |
 | 
					| their                               | `ADJ`  | `poss`  | requests  |
 | 
				
			||||||
| requests                            | `NOUN` | `dobj`  | submit    |
 | 
					| requests                            | `NOUN` | `dobj`  | submit    |
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<Infobox title="Dependency label scheme" emoji="📖">
 | 
					The dependency parse can be a useful tool for **information extraction**,
 | 
				
			||||||
 | 
					especially when combined with other predictions like
 | 
				
			||||||
 | 
					[named entities](#named-entities). The following example extracts money and
 | 
				
			||||||
 | 
					currency values, i.e. entities labeled as `MONEY`, and then uses the dependency
 | 
				
			||||||
 | 
					parse to find the noun phrase they are referring to – for example `"Net income"`
 | 
				
			||||||
 | 
					→ `"$9.4 million"`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
For a list of the syntactic dependency labels assigned by spaCy's models across
 | 
					```python
 | 
				
			||||||
different languages, see the label schemes documented in the
 | 
					### {executable="true"}
 | 
				
			||||||
[models directory](/models).
 | 
					import spacy
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					nlp = spacy.load("en_core_web_sm")
 | 
				
			||||||
 | 
					# Merge noun phrases and entities for easier analysis
 | 
				
			||||||
 | 
					nlp.add_pipe("merge_entities")
 | 
				
			||||||
 | 
					nlp.add_pipe("merge_noun_chunks")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					TEXTS = [
 | 
				
			||||||
 | 
					    "Net income was $9.4 million compared to the prior year of $2.7 million.",
 | 
				
			||||||
 | 
					    "Revenue exceeded twelve billion dollars, with a loss of $1b.",
 | 
				
			||||||
 | 
					]
 | 
				
			||||||
 | 
					for doc in nlp.pipe(TEXTS):
 | 
				
			||||||
 | 
					    for token in doc:
 | 
				
			||||||
 | 
					        if token.ent_type_ == "MONEY":
 | 
				
			||||||
 | 
					            # We have an attribute and direct object, so check for subject
 | 
				
			||||||
 | 
					            if token.dep_ in ("attr", "dobj"):
 | 
				
			||||||
 | 
					                subj = [w for w in token.head.lefts if w.dep_ == "nsubj"]
 | 
				
			||||||
 | 
					                if subj:
 | 
				
			||||||
 | 
					                    print(subj[0], "-->", token)
 | 
				
			||||||
 | 
					            # We have a prepositional object with a preposition
 | 
				
			||||||
 | 
					            elif token.dep_ == "pobj" and token.head.dep_ == "prep":
 | 
				
			||||||
 | 
					                print(token.head.head, "-->", token)
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<Infobox title="Combining models and rules" emoji="📖">
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For more examples of how to write rule-based information extraction logic that
 | 
				
			||||||
 | 
					takes advantage of the model's predictions produced by the different components,
 | 
				
			||||||
 | 
					see the usage guide on
 | 
				
			||||||
 | 
					[combining models and rules](/usage/rule-based-matching#models-rules).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
</Infobox>
 | 
					</Infobox>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -545,7 +587,7 @@ identifier from a knowledge base (KB). You can create your own
 | 
				
			||||||
[train a new Entity Linking model](/usage/training#entity-linker) using that
 | 
					[train a new Entity Linking model](/usage/training#entity-linker) using that
 | 
				
			||||||
custom-made KB.
 | 
					custom-made KB.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### Accessing entity identifiers {#entity-linking-accessing}
 | 
					### Accessing entity identifiers {#entity-linking-accessing model="entity linking"}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
The annotated KB identifier is accessible as either a hash value or as a string,
 | 
					The annotated KB identifier is accessible as either a hash value or as a string,
 | 
				
			||||||
using the attributes `ent.kb_id` and `ent.kb_id_` of a [`Span`](/api/span)
 | 
					using the attributes `ent.kb_id` and `ent.kb_id_` of a [`Span`](/api/span)
 | 
				
			||||||
| 
						 | 
					@ -571,15 +613,6 @@ print(ent_ada_1)  # ['Lovelace', 'PERSON', 'Q7259']
 | 
				
			||||||
print(ent_london_5)  # ['London', 'GPE', 'Q84']
 | 
					print(ent_london_5)  # ['London', 'GPE', 'Q84']
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| Text     | ent_type\_ | ent_kb_id\_ |
 | 
					 | 
				
			||||||
| -------- | ---------- | ----------- |
 | 
					 | 
				
			||||||
| Ada      | `"PERSON"` | `"Q7259"`   |
 | 
					 | 
				
			||||||
| Lovelace | `"PERSON"` | `"Q7259"`   |
 | 
					 | 
				
			||||||
| was      | -          | -           |
 | 
					 | 
				
			||||||
| born     | -          | -           |
 | 
					 | 
				
			||||||
| in       | -          | -           |
 | 
					 | 
				
			||||||
| London   | `"GPE"`    | `"Q84"`     |
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Tokenization {#tokenization}
 | 
					## Tokenization {#tokenization}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Tokenization is the task of splitting a text into meaningful segments, called
 | 
					Tokenization is the task of splitting a text into meaningful segments, called
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in New Issue
	
	Block a user