diff --git a/website/docs/images/sense2vec.jpg b/website/docs/images/sense2vec.jpg
new file mode 100644
index 000000000..3a1772582
Binary files /dev/null and b/website/docs/images/sense2vec.jpg differ
diff --git a/website/docs/usage/101/_vectors-similarity.md b/website/docs/usage/101/_vectors-similarity.md
index a04c96236..92df1b331 100644
--- a/website/docs/usage/101/_vectors-similarity.md
+++ b/website/docs/usage/101/_vectors-similarity.md
@@ -80,25 +80,73 @@ duplicate if it's very similar to an already existing one.
Each [`Doc`](/api/doc), [`Span`](/api/span), [`Token`](/api/token) and
[`Lexeme`](/api/lexeme) comes with a [`.similarity`](/api/token#similarity)
method that lets you compare it with another object, and determine the
-similarity. Of course similarity is always subjective – whether "dog" and "cat"
-are similar really depends on how you're looking at it. spaCy's similarity model
-usually assumes a pretty general-purpose definition of similarity.
+similarity. Of course similarity is always subjective – whether two words, spans
+or documents are similar really depends on how you're looking at it. spaCy's
+similarity model usually assumes a pretty general-purpose definition of
+similarity.
-
+> #### 📝 Things to try
+>
+> 1. Compare two different tokens and try to find the two most _dissimilar_
+> tokens in the texts with the lowest similarity score (according to the
+> vectors).
+> 2. Compare the similarity of two [`Lexeme`](/api/lexeme) objects, entries in
+> the vocabulary. You can get a lexeme via the `.lex` attribute of a token.
+> You should see that the similarity results are identical to the token
+> similarity.
```python
### {executable="true"}
import spacy
nlp = spacy.load("en_core_web_md") # make sure to use larger model!
-tokens = nlp("dog cat banana")
+doc1 = nlp("I like salty fries and hamburgers.")
+doc2 = nlp("Fast food tastes very good.")
-for token1 in tokens:
- for token2 in tokens:
- print(token1.text, token2.text, token1.similarity(token2))
+# Similarity of two documents
+print(doc1, "<->", doc2, doc1.similarity(doc2))
+# Similarity of tokens and spans
+french_fries = doc1[2:4]
+burgers = doc1[5]
+print(french_fries, "<->", burgers, french_fries.similarity(burgers))
```
-In this case, the model's predictions are pretty on point. A dog is very similar
-to a cat, whereas a banana is not very similar to either of them. Identical
-tokens are obviously 100% similar to each other (just not always exactly `1.0`,
-because of vector math and floating point imprecisions).
+### What to expect from similarity results {#similarity-expectations}
+
+Computing similarity scores can be helpful in many situations, but it's also
+important to maintain **realistic expectations** about what information it can
+provide. Words can be related to each over in many ways, so a single
+"similarity" score will always be a **mix of different signals**, and vectors
+trained on different data can produce very different results that may not be
+useful for your purpose. Here are some important considerations to keep in mind:
+
+- There's no objective definition of similarity. Whether "I like burgers" and "I
+ like pasta" is similar **depends on your application**. Both talk about food
+ preferences, which makes them very similar – but if you're analyzing mentions
+ of food, those sentences are pretty dissimilar, because they talk about very
+ different foods.
+- The similarity of [`Doc`](/api/doc) and [`Span`](/api/span) objects defaults
+ to the **average** of the token vectors. This means that the vector for "fast
+ food" is the average of the vectors for "fast" and "food", which isn't
+ necessarily representative of the phrase "fast food".
+- Vector averaging means that the vector of multiple tokens is **insensitive to
+ the order** of the words. Two documents expressing the same meaning with
+ dissimilar wording will return a lower similarity score than two documents
+ that happen to contain the same words while expressing different meanings.
+
+
+
+[![](../../images/sense2vec.jpg)](https://github.com/explosion/sense2vec)
+
+[`sense2vec`](https://github.com/explosion/sense2vec) is a library developed by
+us that builds on top of spaCy and lets you train and query more interesting and
+detailed word vectors. It combines noun phrases like "fast food" or "fair game"
+and includes the part-of-speech tags and entity labels. The library also
+includes annotation recipes for our annotation tool [Prodigy](https://prodi.gy)
+that let you evaluate vector models and create terminology lists. For more
+details, check out
+[our blog post](https://explosion.ai/blog/sense2vec-reloaded). To explore the
+semantic similarities across all Reddit comments of 2015 and 2019, see the
+[interactive demo](https://explosion.ai/demos/sense2vec).
+
+
diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md
index 10efcf875..3aa0df7b4 100644
--- a/website/docs/usage/linguistic-features.md
+++ b/website/docs/usage/linguistic-features.md
@@ -1547,23 +1547,6 @@ import Vectors101 from 'usage/101/\_vectors-similarity.md'
-
-
-Computing similarity scores can be helpful in many situations, but it's also
-important to maintain **realistic expectations** about what information it can
-provide. Words can be related to each over in many ways, so a single
-"similarity" score will always be a **mix of different signals**, and vectors
-trained on different data can produce very different results that may not be
-useful for your purpose.
-
-Also note that the similarity of `Doc` or `Span` objects defaults to the
-**average** of the token vectors. This means it's insensitive to the order of
-the words. Two documents expressing the same meaning with dissimilar wording
-will return a lower similarity score than two documents that happen to contain
-the same words while expressing different meanings.
-
-
-
### Adding word vectors {#adding-vectors}
Custom word vectors can be trained using a number of open-source libraries, such
diff --git a/website/src/components/link.js b/website/src/components/link.js
index 3644479c5..acded7d0d 100644
--- a/website/src/components/link.js
+++ b/website/src/components/link.js
@@ -6,7 +6,7 @@ import classNames from 'classnames'
import Icon from './icon'
import classes from '../styles/link.module.sass'
-import { isString } from './util'
+import { isString, isImage } from './util'
const internalRegex = /(http(s?)):\/\/(prodi.gy|spacy.io|irl.spacy.io|explosion.ai|course.spacy.io)/gi
@@ -39,7 +39,7 @@ export default function Link({
const dest = to || href
const external = forceExternal || /(http(s?)):\/\//gi.test(dest)
const icon = getIcon(dest)
- const withIcon = !hidden && !hideIcon && !!icon
+ const withIcon = !hidden && !hideIcon && !!icon && !isImage(children)
const sourceWithText = withIcon && isString(children)
const linkClassNames = classNames(classes.root, className, {
[classes.hidden]: hidden,
diff --git a/website/src/components/util.js b/website/src/components/util.js
index 844f2c133..a9c6efcf5 100644
--- a/website/src/components/util.js
+++ b/website/src/components/util.js
@@ -46,6 +46,17 @@ export function isString(obj) {
return typeof obj === 'string' || obj instanceof String
}
+/**
+ * @param obj - The object to check.
+ * @returns {boolean} – Whether the object is an image
+ */
+export function isImage(obj) {
+ if (!obj || !React.isValidElement(obj)) {
+ return false
+ }
+ return obj.props.name == 'img' || obj.props.className == 'gatsby-resp-image-wrapper'
+}
+
/**
* @param obj - The object to check.
* @returns {boolean} - Whether the object is empty.