mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-30 23:47:31 +03:00 
			
		
		
		
	
		
			
				
	
	
	
		
			11 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			11 KiB
		
	
	
	
	
	
	
	
| title | teaser | tag | source | 
|---|---|---|---|
| Lexeme | An entry in the vocabulary | class | spacy/lexeme.pyx | 
A Lexeme has no string context – it's a word type, as opposed to a word token.
It therefore has no part-of-speech tag, dependency parse, or lemma (if
lemmatization depends on the part-of-speech tag).
Lexeme.__init__
Create a Lexeme object.
| Name | Type | Description | 
|---|---|---|
| vocab | Vocab | The parent vocabulary. | 
| orth | int | The orth id of the lexeme. | 
| RETURNS | Lexeme | The newly constructed object. | 
Lexeme.set_flag
Change the value of a boolean flag.
Example
COOL_FLAG = nlp.vocab.add_flag(lambda text: False) nlp.vocab["spaCy"].set_flag(COOL_FLAG, True)
| Name | Type | Description | 
|---|---|---|
| flag_id | int | The attribute ID of the flag to set. | 
| value | bool | The new value of the flag. | 
Lexeme.check_flag
Check the value of a boolean flag.
Example
is_my_library = lambda text: text in ["spaCy", "Thinc"] MY_LIBRARY = nlp.vocab.add_flag(is_my_library) assert nlp.vocab["spaCy"].check_flag(MY_LIBRARY) == True
| Name | Type | Description | 
|---|---|---|
| flag_id | int | The attribute ID of the flag to query. | 
| RETURNS | bool | The value of the flag. | 
Lexeme.similarity
Compute a semantic similarity estimate. Defaults to cosine over vectors.
Example
apple = nlp.vocab["apple"] orange = nlp.vocab["orange"] apple_orange = apple.similarity(orange) orange_apple = orange.similarity(apple) assert apple_orange == orange_apple
| Name | Type | Description | 
|---|---|---|
| other | - | The object to compare with. By default, accepts Doc,Span,TokenandLexemeobjects. | 
| RETURNS | float | A scalar similarity score. Higher is more similar. | 
Lexeme.has_vector
A boolean value indicating whether a word vector is associated with the lexeme.
Example
apple = nlp.vocab["apple"] assert apple.has_vector
| Name | Type | Description | 
|---|---|---|
| RETURNS | bool | Whether the lexeme has a vector data attached. | 
Lexeme.vector
A real-valued meaning representation.
Example
apple = nlp.vocab["apple"] assert apple.vector.dtype == "float32" assert apple.vector.shape == (300,)
| Name | Type | Description | 
|---|---|---|
| RETURNS | numpy.ndarray[ndim=1, dtype='float32'] | A 1D numpy array representing the lexeme's semantics. | 
Lexeme.vector_norm
The L2 norm of the lexeme's vector representation.
Example
apple = nlp.vocab["apple"] pasta = nlp.vocab["pasta"] apple.vector_norm # 7.1346845626831055 pasta.vector_norm # 7.759851932525635 assert apple.vector_norm != pasta.vector_norm
| Name | Type | Description | 
|---|---|---|
| RETURNS | float | The L2 norm of the vector representation. | 
Attributes
| Name | Type | Description | 
|---|---|---|
| vocab | Vocab | The lexeme's vocabulary. | 
| text | unicode | Verbatim text content. | 
| orth | int | ID of the verbatim text content. | 
| orth_ | unicode | Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes. | 
| rank | int | Sequential ID of the lexemes's lexical type, used to index into tables, e.g. for word vectors. | 
| flags | int | Container of the lexeme's binary flags. | 
| norm | int | The lexemes's norm, i.e. a normalized form of the lexeme text. | 
| norm_ | unicode | The lexemes's norm, i.e. a normalized form of the lexeme text. | 
| lower | int | Lowercase form of the word. | 
| lower_ | unicode | Lowercase form of the word. | 
| shape | int | Transform of the word's string, to show orthographic features. | 
| shape_ | unicode | Transform of the word's string, to show orthographic features. | 
| prefix | int | Length-N substring from the start of the word. Defaults to N=1. | 
| prefix_ | unicode | Length-N substring from the start of the word. Defaults to N=1. | 
| suffix | int | Length-N substring from the end of the word. Defaults to N=3. | 
| suffix_ | unicode | Length-N substring from the start of the word. Defaults to N=3. | 
| is_alpha | bool | Does the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha(). | 
| is_ascii | bool | Does the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)]. | 
| is_digit | bool | Does the lexeme consist of digits? Equivalent to lexeme.text.isdigit(). | 
| is_lower | bool | Is the lexeme in lowercase? Equivalent to lexeme.text.islower(). | 
| is_upper | bool | Is the lexeme in uppercase? Equivalent to lexeme.text.isupper(). | 
| is_title | bool | Is the lexeme in titlecase? Equivalent to lexeme.text.istitle(). | 
| is_punct | bool | Is the lexeme punctuation? | 
| is_left_punct | bool | Is the lexeme a left punctuation mark, e.g. (? | 
| is_right_punct | bool | Is the lexeme a right punctuation mark, e.g. )? | 
| is_space | bool | Does the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace(). | 
| is_bracket | bool | Is the lexeme a bracket? | 
| is_quote | bool | Is the lexeme a quotation mark? | 
| is_currency2.0.8 | bool | Is the lexeme a currency symbol? | 
| like_url | bool | Does the lexeme resemble a URL? | 
| like_num | bool | Does the lexeme represent a number? e.g. "10.9", "10", "ten", etc. | 
| like_email | bool | Does the lexeme resemble an email address? | 
| is_oov | bool | Is the lexeme out-of-vocabulary? | 
| is_stop | bool | Is the lexeme part of a "stop list"? | 
| lang | int | Language of the parent vocabulary. | 
| lang_ | unicode | Language of the parent vocabulary. | 
| prob | float | Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary). | 
| cluster | int | Brown cluster ID. | 
| sentiment | float | A scalar value indicating the positivity or negativity of the lexeme. |