From ec751068f328e47ae7fa8ca1745a1dd8ac00529d Mon Sep 17 00:00:00 2001
From: Matthew Honnibal <honnibal+gh@gmail.com>
Date: Thu, 17 Sep 2020 16:42:53 +0200
Subject: [PATCH 1/2] Draft text for static vectors intro

---
 website/docs/usage/embeddings-transformers.md | 45 +++++++++++++++----
 1 file changed, 36 insertions(+), 9 deletions(-)
diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md
index 8dd104ead..6a239cb1e 100644
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@@ -30,14 +30,20 @@ to predict. Otherwise, you could try using a "one-shot learning" approach using
 
 <Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models">
 
-The key difference between [word vectors](#word-vectors) and contextual language
-models such as [transformers](#transformers) is that word vectors model
-**lexical types**, rather than _tokens_. If you have a list of terms with no
-context around them, a transformer model like BERT can't really help you. BERT
-is designed to understand language **in context**, which isn't what you have. A
-word vectors table will be a much better fit for your task. However, if you do
-have words in context — whole sentences or paragraphs of running text — word
-vectors will only provide a very rough approximation of what the text is about.
+[Transformers](#transformers) are large and powerful neural networks that give
+you better accuracy, but are harder to deploy in production, as they require a GPU to run
+effectively. [Word vectors](#word-vectors) are a slightly older technique that
+can give your models a smaller improvement in accuracy, and can also provide
+some additional capabilities. 
+
+The key difference between word-vectors and contextual language
+models such as transformers is that word vectors model **lexical types**, rather
+than _tokens_. If you have a list of terms with no context around them, a transformer
+model like BERT can't really help you. BERT is designed to understand language
+**in context**, which isn't what you have. A word vectors table will be a much
+better fit for your task. However, if you do have words in context — whole sentences
+or paragraphs of running text — word vectors will only provide a very rough
+approximation of what the text is about.
 
 Word vectors are also very computationally efficient, as they map a word to a
 vector with a single indexing operation. Word vectors are therefore useful as a
@@ -478,7 +484,28 @@ training.
 
 ## Static vectors {#static-vectors}
 
-<!-- TODO: write -->
+If your pipeline includes a word vectors table, you'll be able to use the
+`.similarity()` method on the `Doc`, `Span`, `Token` and `Lexeme` objects.
+You'll also be able to access the vectors using the `.vector` attribute, or you
+can look up one or more vectors directly using the `Vocab` object. Pipelines
+with word vectors can also use the vectors as features for the statistical
+models, which can improve the accuracy of your components.
+
+Word vectors in spaCy are "static" in the sense that they are not learned
+parameters of the statistical models, and spaCy itself does not feature any
+algorithms for learning word vector tables. You can train a word vectors table
+using tools such as Gensim, word2vec, FastText or GloVe. There are also many
+word vector tables available for download. Once you have a word vectors table
+you want to use, you can convert it for use with spaCy using the `spacy init vocab`
+command, which will give you a directory you can load or refer to in your training
+configs.
+
+When converting the vectors, there are two ways you can trim them down to make
+your package smaller. You can _truncate_ the vectors with the `--truncate-vectors`
+option, which will remove entries for rarer words from the table. Alternatively,
+you can use the `--prune-vectors` option to remap rarer words to the closest vector
+that remains in the table. This allows the vectors table to return meaningful
+(albeit imperfect) results for more words than you have rows in the table.
 
 ### Using word vectors in your models {#word-vectors-models}
 

From a2c8cda26ffbc6ba0e15b0872b8691ee4f366994 Mon Sep 17 00:00:00 2001
From: Ines Montani <ines@ines.io>
Date: Thu, 17 Sep 2020 17:12:51 +0200
Subject: [PATCH 2/2] Update docs [ci skip]

---
 website/docs/usage/embeddings-transformers.md | 60 ++++++++++---------
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/website/docs/usage/embeddings-transformers.md b/website/docs/usage/embeddings-transformers.md
index 6a239cb1e..9f73661c3 100644
--- a/website/docs/usage/embeddings-transformers.md
+++ b/website/docs/usage/embeddings-transformers.md
@@ -31,18 +31,18 @@ to predict. Otherwise, you could try using a "one-shot learning" approach using
 <Accordion title="What’s the difference between word vectors and language models?" id="vectors-vs-language-models">
 
 [Transformers](#transformers) are large and powerful neural networks that give
-you better accuracy, but are harder to deploy in production, as they require a GPU to run
-effectively. [Word vectors](#word-vectors) are a slightly older technique that
-can give your models a smaller improvement in accuracy, and can also provide
-some additional capabilities. 
+you better accuracy, but are harder to deploy in production, as they require a
+GPU to run effectively. [Word vectors](#word-vectors) are a slightly older
+technique that can give your models a smaller improvement in accuracy, and can
+also provide some additional capabilities.
 
-The key difference between word-vectors and contextual language
-models such as transformers is that word vectors model **lexical types**, rather
-than _tokens_. If you have a list of terms with no context around them, a transformer
-model like BERT can't really help you. BERT is designed to understand language
-**in context**, which isn't what you have. A word vectors table will be a much
-better fit for your task. However, if you do have words in context — whole sentences
-or paragraphs of running text — word vectors will only provide a very rough
+The key difference between word-vectors and contextual language models such as
+transformers is that word vectors model **lexical types**, rather than _tokens_.
+If you have a list of terms with no context around them, a transformer model
+like BERT can't really help you. BERT is designed to understand language **in
+context**, which isn't what you have. A word vectors table will be a much better
+fit for your task. However, if you do have words in context — whole sentences or
+paragraphs of running text — word vectors will only provide a very rough
 approximation of what the text is about.
 
 Word vectors are also very computationally efficient, as they map a word to a
@@ -484,28 +484,32 @@ training.
 
 ## Static vectors {#static-vectors}
 
-If your pipeline includes a word vectors table, you'll be able to use the
-`.similarity()` method on the `Doc`, `Span`, `Token` and `Lexeme` objects.
-You'll also be able to access the vectors using the `.vector` attribute, or you
-can look up one or more vectors directly using the `Vocab` object. Pipelines
-with word vectors can also use the vectors as features for the statistical
-models, which can improve the accuracy of your components.
+If your pipeline includes a **word vectors table**, you'll be able to use the
+`.similarity()` method on the [`Doc`](/api/doc), [`Span`](/api/span),
+[`Token`](/api/token) and [`Lexeme`](/api/lexeme) objects. You'll also be able
+to access the vectors using the `.vector` attribute, or you can look up one or
+more vectors directly using the [`Vocab`](/api/vocab) object. Pipelines with
+word vectors can also **use the vectors as features** for the statistical
+models, which can **improve the accuracy** of your components.
 
 Word vectors in spaCy are "static" in the sense that they are not learned
 parameters of the statistical models, and spaCy itself does not feature any
 algorithms for learning word vector tables. You can train a word vectors table
-using tools such as Gensim, word2vec, FastText or GloVe. There are also many
-word vector tables available for download. Once you have a word vectors table
-you want to use, you can convert it for use with spaCy using the `spacy init vocab`
-command, which will give you a directory you can load or refer to in your training
-configs.
+using tools such as [Gensim](https://radimrehurek.com/gensim/),
+[FastText](https://fasttext.cc/) or
+[GloVe](https://nlp.stanford.edu/projects/glove/), or download existing
+pretrained vectors. The [`init vocab`](/api/cli#init-vocab) command lets you
+convert vectors for use with spaCy and will give you a directory you can load or
+refer to in your [training configs](/usage/training#config).
 
-When converting the vectors, there are two ways you can trim them down to make
-your package smaller. You can _truncate_ the vectors with the `--truncate-vectors`
-option, which will remove entries for rarer words from the table. Alternatively,
-you can use the `--prune-vectors` option to remap rarer words to the closest vector
-that remains in the table. This allows the vectors table to return meaningful
-(albeit imperfect) results for more words than you have rows in the table.
+<Infobox title="Word vectors and similarity" emoji="📖">
+
+For more details on loading word vectors into spaCy, using them for similarity
+and improving word vector coverage by truncating and pruning the vectors, see
+the usage guide on
+[word vectors and similarity](/usage/linguistic-features#vectors-similarity).
+
+</Infobox>
 
 ### Using word vectors in your models {#word-vectors-models}