mirror of
				https://github.com/explosion/spaCy.git
				synced 2025-10-31 07:57:35 +03:00 
			
		
		
		
	Update docs [ci skip]
This commit is contained in:
		
							parent
							
								
									ec751068f3
								
							
						
					
					
						commit
						a2c8cda26f
					
				|  | @ -31,18 +31,18 @@ to predict. Otherwise, you could try using a "one-shot learning" approach using | ||||||
| <Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models"> | <Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models"> | ||||||
| 
 | 
 | ||||||
| [Transformers](#transformers) are large and powerful neural networks that give | [Transformers](#transformers) are large and powerful neural networks that give | ||||||
| you better accuracy, but are harder to deploy in production, as they require a GPU to run | you better accuracy, but are harder to deploy in production, as they require a | ||||||
| effectively. [Word vectors](#word-vectors) are a slightly older technique that | GPU to run effectively. [Word vectors](#word-vectors) are a slightly older | ||||||
| can give your models a smaller improvement in accuracy, and can also provide | technique that can give your models a smaller improvement in accuracy, and can | ||||||
| some additional capabilities.  | also provide some additional capabilities. | ||||||
| 
 | 
 | ||||||
| The key difference between word-vectors and contextual language | The key difference between word-vectors and contextual language models such as | ||||||
| models such as transformers is that word vectors model **lexical types**, rather | transformers is that word vectors model **lexical types**, rather than _tokens_. | ||||||
| than _tokens_. If you have a list of terms with no context around them, a transformer | If you have a list of terms with no context around them, a transformer model | ||||||
| model like BERT can't really help you. BERT is designed to understand language | like BERT can't really help you. BERT is designed to understand language **in | ||||||
| **in context**, which isn't what you have. A word vectors table will be a much | context**, which isn't what you have. A word vectors table will be a much better | ||||||
| better fit for your task. However, if you do have words in context — whole sentences | fit for your task. However, if you do have words in context — whole sentences or | ||||||
| or paragraphs of running text — word vectors will only provide a very rough | paragraphs of running text — word vectors will only provide a very rough | ||||||
| approximation of what the text is about. | approximation of what the text is about. | ||||||
| 
 | 
 | ||||||
| Word vectors are also very computationally efficient, as they map a word to a | Word vectors are also very computationally efficient, as they map a word to a | ||||||
|  | @ -484,28 +484,32 @@ training. | ||||||
| 
 | 
 | ||||||
| ## Static vectors {#static-vectors} | ## Static vectors {#static-vectors} | ||||||
| 
 | 
 | ||||||
| If your pipeline includes a word vectors table, you'll be able to use the | If your pipeline includes a **word vectors table**, you'll be able to use the | ||||||
| `.similarity()` method on the `Doc`, `Span`, `Token` and `Lexeme` objects. | `.similarity()` method on the [`Doc`](/api/doc), [`Span`](/api/span), | ||||||
| You'll also be able to access the vectors using the `.vector` attribute, or you | [`Token`](/api/token) and [`Lexeme`](/api/lexeme) objects. You'll also be able | ||||||
| can look up one or more vectors directly using the `Vocab` object. Pipelines | to access the vectors using the `.vector` attribute, or you can look up one or | ||||||
| with word vectors can also use the vectors as features for the statistical | more vectors directly using the [`Vocab`](/api/vocab) object. Pipelines with | ||||||
| models, which can improve the accuracy of your components. | word vectors can also **use the vectors as features** for the statistical | ||||||
|  | models, which can **improve the accuracy** of your components. | ||||||
| 
 | 
 | ||||||
| Word vectors in spaCy are "static" in the sense that they are not learned | Word vectors in spaCy are "static" in the sense that they are not learned | ||||||
| parameters of the statistical models, and spaCy itself does not feature any | parameters of the statistical models, and spaCy itself does not feature any | ||||||
| algorithms for learning word vector tables. You can train a word vectors table | algorithms for learning word vector tables. You can train a word vectors table | ||||||
| using tools such as Gensim, word2vec, FastText or GloVe. There are also many | using tools such as [Gensim](https://radimrehurek.com/gensim/), | ||||||
| word vector tables available for download. Once you have a word vectors table | [FastText](https://fasttext.cc/) or | ||||||
| you want to use, you can convert it for use with spaCy using the `spacy init vocab` | [GloVe](https://nlp.stanford.edu/projects/glove/), or download existing | ||||||
| command, which will give you a directory you can load or refer to in your training | pretrained vectors. The [`init vocab`](/api/cli#init-vocab) command lets you | ||||||
| configs. | convert vectors for use with spaCy and will give you a directory you can load or | ||||||
|  | refer to in your [training configs](/usage/training#config). | ||||||
| 
 | 
 | ||||||
| When converting the vectors, there are two ways you can trim them down to make | <Infobox title="Word vectors and similarity" emoji="📖"> | ||||||
| your package smaller. You can _truncate_ the vectors with the `--truncate-vectors` | 
 | ||||||
| option, which will remove entries for rarer words from the table. Alternatively, | For more details on loading word vectors into spaCy, using them for similarity | ||||||
| you can use the `--prune-vectors` option to remap rarer words to the closest vector | and improving word vector coverage by truncating and pruning the vectors, see | ||||||
| that remains in the table. This allows the vectors table to return meaningful | the usage guide on | ||||||
| (albeit imperfect) results for more words than you have rows in the table. | [word vectors and similarity](/usage/linguistic-features#vectors-similarity). | ||||||
|  | 
 | ||||||
|  | </Infobox> | ||||||
| 
 | 
 | ||||||
| ### Using word vectors in your models {#word-vectors-models} | ### Using word vectors in your models {#word-vectors-models} | ||||||
| 
 | 
 | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue
	
	Block a user