mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-11-04 01:48:04 +03:00 
			
		
		
		
	
		
			
				
	
	
	
		
			17 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			17 KiB
		
	
	
	
	
	
	
	
| title | teaser | tag | source | 
|---|---|---|---|
| Lexeme | An entry in the vocabulary | class | spacy/lexeme.pyx | 
A Lexeme has no string context – it's a word type, as opposed to a word token.
It therefore has no part-of-speech tag, dependency parse, or lemma (if
lemmatization depends on the part-of-speech tag).
Lexeme.__init__
Create a Lexeme object.
| Name | Description | 
|---|---|
vocab | 
The parent vocabulary.  | 
orth | 
The orth id of the lexeme.  | 
Lexeme.set_flag
Change the value of a boolean flag.
Example
COOL_FLAG = nlp.vocab.add_flag(lambda text: False) nlp.vocab["spaCy"].set_flag(COOL_FLAG, True)
| Name | Description | 
|---|---|
flag_id | 
The attribute ID of the flag to set.  | 
value | 
The new value of the flag.  | 
Lexeme.check_flag
Check the value of a boolean flag.
Example
is_my_library = lambda text: text in ["spaCy", "Thinc"] MY_LIBRARY = nlp.vocab.add_flag(is_my_library) assert nlp.vocab["spaCy"].check_flag(MY_LIBRARY) == True
| Name | Description | 
|---|---|
flag_id | 
The attribute ID of the flag to query.  | 
| RETURNS | The value of the flag.  | 
Lexeme.similarity
Compute a semantic similarity estimate. Defaults to cosine over vectors.
Example
apple = nlp.vocab["apple"] orange = nlp.vocab["orange"] apple_orange = apple.similarity(orange) orange_apple = orange.similarity(apple) assert apple_orange == orange_apple
| Name | Description | 
|---|---|
| other | The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.  | 
| RETURNS | A scalar similarity score. Higher is more similar.  | 
Lexeme.has_vector
A boolean value indicating whether a word vector is associated with the lexeme.
Example
apple = nlp.vocab["apple"] assert apple.has_vector
| Name | Description | 
|---|---|
| RETURNS | Whether the lexeme has a vector data attached.  | 
Lexeme.vector
A real-valued meaning representation.
Example
apple = nlp.vocab["apple"] assert apple.vector.dtype == "float32" assert apple.vector.shape == (300,)
| Name | Description | 
|---|---|
| RETURNS | A 1-dimensional array representing the lexeme's vector.  | 
Lexeme.vector_norm
The L2 norm of the lexeme's vector representation.
Example
apple = nlp.vocab["apple"] pasta = nlp.vocab["pasta"] apple.vector_norm # 7.1346845626831055 pasta.vector_norm # 7.759851932525635 assert apple.vector_norm != pasta.vector_norm
| Name | Description | 
|---|---|
| RETURNS | The L2 norm of the vector representation.  | 
Attributes
| Name | Description | 
|---|---|
vocab | 
The lexeme's vocabulary.  | 
text | 
Verbatim text content.  | 
orth | 
ID of the verbatim text content.  | 
orth_ | 
Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes.  | 
rank | 
Sequential ID of the lexeme's lexical type, used to index into tables, e.g. for word vectors.  | 
flags | 
Container of the lexeme's binary flags.  | 
norm | 
The lexeme's norm, i.e. a normalized form of the lexeme text.  | 
norm_ | 
The lexeme's norm, i.e. a normalized form of the lexeme text.  | 
lower | 
Lowercase form of the word.  | 
lower_ | 
Lowercase form of the word.  | 
shape | 
Transform of the word's string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd".  | 
shape_ | 
Transform of the word's string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,"Xxxx"or"dd".  | 
prefix | 
Length-N substring from the start of the word. Defaults to N=1.  | 
prefix_ | 
Length-N substring from the start of the word. Defaults to N=1.  | 
suffix | 
Length-N substring from the end of the word. Defaults to N=3.  | 
suffix_ | 
Length-N substring from the start of the word. Defaults to N=3.  | 
is_alpha | 
Does the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha().  | 
is_ascii | 
Does the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)].  | 
is_digit | 
Does the lexeme consist of digits? Equivalent to lexeme.text.isdigit().  | 
is_lower | 
Is the lexeme in lowercase? Equivalent to lexeme.text.islower().  | 
is_upper | 
Is the lexeme in uppercase? Equivalent to lexeme.text.isupper().  | 
is_title | 
Is the lexeme in titlecase? Equivalent to lexeme.text.istitle().  | 
is_punct | 
Is the lexeme punctuation?  | 
is_left_punct | 
Is the lexeme a left punctuation mark, e.g. (?  | 
is_right_punct | 
Is the lexeme a right punctuation mark, e.g. )?  | 
is_space | 
Does the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace().  | 
is_bracket | 
Is the lexeme a bracket?  | 
is_quote | 
Is the lexeme a quotation mark?  | 
is_currency 2.0.8 | 
Is the lexeme a currency symbol?  | 
like_url | 
Does the lexeme resemble a URL?  | 
like_num | 
Does the lexeme represent a number? e.g. "10.9", "10", "ten", etc.  | 
like_email | 
Does the lexeme resemble an email address?  | 
is_oov | 
Is the lexeme out-of-vocabulary (i.e. does it not have a word vector)?  | 
is_stop | 
Is the lexeme part of a "stop list"?  | 
lang | 
Language of the parent vocabulary.  | 
lang_ | 
Language of the parent vocabulary.  | 
prob | 
Smoothed log probability estimate of the lexeme's word type (context-independent entry in the vocabulary).  | 
cluster | 
Brown cluster ID.  | 
sentiment | 
A scalar value indicating the positivity or negativity of the lexeme.  |