mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-30 23:47:31 +03:00 
			
		
		
		
	* Adjust Table API and add docs * Add attributes and update description [ci skip] * Use strings.get_string_id instead of hash_string * Fix table method calls * Make orth arg in Lemmatizer.lookup optional Fall back to string, which is now handled by Table.__contains__ out-of-the-box * Fix method name * Auto-format
		
			
				
	
	
		
			100 lines
		
	
	
		
			4.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			100 lines
		
	
	
		
			4.9 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Lemmatizer
 | |
| teaser: Assign the base forms of words
 | |
| tag: class
 | |
| source: spacy/lemmatizer.py
 | |
| ---
 | |
| 
 | |
| The `Lemmatizer` supports simple part-of-speech-sensitive suffix rules and
 | |
| lookup tables.
 | |
| 
 | |
| ## Lemmatizer.\_\_init\_\_ {#init tag="method"}
 | |
| 
 | |
| Create a `Lemmatizer`.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.lemmatizer import Lemmatizer
 | |
| > lemmatizer = Lemmatizer()
 | |
| > ```
 | |
| 
 | |
| | Name         | Type          | Description                                                |
 | |
| | ------------ | ------------- | ---------------------------------------------------------- |
 | |
| | `index`      | dict / `None` | Inventory of lemmas in the language.                       |
 | |
| | `exceptions` | dict / `None` | Mapping of string forms to lemmas that bypass the `rules`. |
 | |
| | `rules`      | dict / `None` | List of suffix rewrite rules.                              |
 | |
| | `lookup`     | dict / `None` | Lookup table mapping string to their lemmas.               |
 | |
| | **RETURNS**  | `Lemmatizer`  | The newly created object.                                  |
 | |
| 
 | |
| ## Lemmatizer.\_\_call\_\_ {#call tag="method"}
 | |
| 
 | |
| Lemmatize a string.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > from spacy.lemmatizer import Lemmatizer
 | |
| > rules = {"noun": [["s", ""]]}
 | |
| > lemmatizer = Lemmatizer(index={}, exceptions={}, rules=rules)
 | |
| > lemmas = lemmatizer("ducks", "NOUN")
 | |
| > assert lemmas == ["duck"]
 | |
| > ```
 | |
| 
 | |
| | Name         | Type          | Description                                                                                              |
 | |
| | ------------ | ------------- | -------------------------------------------------------------------------------------------------------- |
 | |
| | `string`     | unicode       | The string to lemmatize, e.g. the token text.                                                            |
 | |
| | `univ_pos`   | unicode / int | The token's universal part-of-speech tag.                                                                |
 | |
| | `morphology` | dict / `None` | Morphological features following the [Universal Dependencies](http://universaldependencies.org/) scheme. |
 | |
| | **RETURNS**  | list          | The available lemmas for the string.                                                                     |
 | |
| 
 | |
| ## Lemmatizer.lookup {#lookup tag="method" new="2"}
 | |
| 
 | |
| Look up a lemma in the lookup table, if available. If no lemma is found, the
 | |
| original string is returned. Languages can provide a
 | |
| [lookup table](/usage/adding-languages#lemmatizer) via the `resources`, set on
 | |
| the individual `Language` class.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > lookup = {"going": "go"}
 | |
| > lemmatizer = Lemmatizer(lookup=lookup)
 | |
| > assert lemmatizer.lookup("going") == "go"
 | |
| > ```
 | |
| 
 | |
| | Name        | Type    | Description                                                                                                 |
 | |
| | ----------- | ------- | ----------------------------------------------------------------------------------------------------------- |
 | |
| | `string`    | unicode | The string to look up.                                                                                      |
 | |
| | `orth`      | int     | Optional hash of the string to look up. If not set, the string will be used and hashed. Defaults to `None`. |
 | |
| | **RETURNS** | unicode | The lemma if the string was found, otherwise the original string.                                           |
 | |
| 
 | |
| ## Lemmatizer.is_base_form {#is_base_form tag="method"}
 | |
| 
 | |
| Check whether we're dealing with an uninflected paradigm, so we can avoid
 | |
| lemmatization entirely.
 | |
| 
 | |
| > #### Example
 | |
| >
 | |
| > ```python
 | |
| > pos = "verb"
 | |
| > morph = {"VerbForm": "inf"}
 | |
| > is_base_form = lemmatizer.is_base_form(pos, morph)
 | |
| > assert is_base_form == True
 | |
| > ```
 | |
| 
 | |
| | Name         | Type          | Description                                                                             |
 | |
| | ------------ | ------------- | --------------------------------------------------------------------------------------- |
 | |
| | `univ_pos`   | unicode / int | The token's universal part-of-speech tag.                                               |
 | |
| | `morphology` | dict          | The token's morphological features.                                                     |
 | |
| | **RETURNS**  | bool          | Whether the token's part-of-speech tag and morphological features describe a base form. |
 | |
| 
 | |
| ## Attributes {#attributes}
 | |
| 
 | |
| | Name                                      | Type          | Description                                                |
 | |
| | ----------------------------------------- | ------------- | ---------------------------------------------------------- |
 | |
| | `index`                                   | dict / `None` | Inventory of lemmas in the language.                       |
 | |
| | `exc`                                     | dict / `None` | Mapping of string forms to lemmas that bypass the `rules`. |
 | |
| | `rules`                                   | dict / `None` | List of suffix rewrite rules.                              |
 | |
| | `lookup_table` <Tag variant="new">2</Tag> | dict / `None` | The lemma lookup table, if available.                      |
 |