spaCy/website/docs/api
Connor Brinton 6dd56868de
📝 Fix formula for receptive field in docs (#12918)
SpaCy's HashEmbedCNN layer performs convolutions over tokens to produce
contextualized embeddings using a `MaxoutWindowEncoder` layer. These
convolutions are implemented using Thinc's `expand_window` layer, which
concatenates `window_size` neighboring sequence items on either side of
the sequence item being processed. This is repeated across `depth`
convolutional layers.

For example, consider the sequence "ABCDE" and a `MaxoutWindowEncoder`
layer with a context window of 1 and a depth of 2. We'll focus on the
token "C". We can visually represent the contextual embedding produced
for "C" as:
```mermaid
flowchart LR
A0(A<sub>0</sub>)
B0(B<sub>0</sub>)
C0(C<sub>0</sub>)
D0(D<sub>0</sub>)
E0(E<sub>0</sub>)
B1(B<sub>1</sub>)
C1(C<sub>1</sub>)
D1(D<sub>1</sub>)
C2(C<sub>2</sub>)
A0 --> B1
B0 --> B1
C0 --> B1
B0 --> C1
C0 --> C1
D0 --> C1
C0 --> D1
D0 --> D1
E0 --> D1
B1 --> C2
C1 --> C2
D1 --> C2
```

Described in words, this graph shows that before the first layer of the
convolution, the "receptive field" centered at each token consists only
of that same token. That is to say, that we have a receptive field of 1.
The first layer of the convolution adds one neighboring token on either
side to the receptive field. Since this is done on both sides, the
receptive field increases by 2, giving the first layer a receptive field
of 3. The second layer of the convolutions adds an _additional_
neighboring token on either side to the receptive field, giving a final
receptive field of 5.

However, this doesn't match the formula currently given in the docs,
which read:
> The receptive field of the CNN will be
> `depth * (window_size * 2 + 1)`, so a 4-layer network with a window
> size of `2` will be sensitive to 20 words at a time.

Substituting in our depth of 2 and window size of 1, this formula gives
us a receptive field of:
```
depth * (window_size * 2 + 1)
= 2 * (1 * 2 + 1)
= 2 * (2 + 1)
= 2 * 3
= 6
```

This not only doesn't match our computations from above, it's also an
even number! This is suspicious, since the receptive field is supposed
to be centered on a token, and not between tokens. Generally, this
formula results in an even number for any even value of `depth`.

The error in this formula is that the adjustment for the center token
is multiplied by the depth, when it should occur only once. The
corrected formula, `depth * window_size * 2 + 1`, gives the correct
value for our small example from above:
```
depth * window_size * 2 + 1
= 2 * 1 * 2 + 1
= 4 + 1
= 5
```

These changes update the docs to correct the receptive field formula and
the example receptive field size.
2023-08-21 10:52:32 +02:00
..
architectures.mdx 📝 Fix formula for receptive field in docs (#12918) 2023-08-21 10:52:32 +02:00
attributeruler.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
attributes.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
cli.mdx Add cli for finding locations of registered func (#12757) 2023-07-31 09:39:00 +02:00
coref.mdx corrected example code (#12466) 2023-03-27 11:32:49 +02:00
corpus.mdx Add spacy.PlainTextCorpusReader.v1 (#12122) 2023-01-26 11:33:22 +01:00
cython-classes.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
cython-structs.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
cython.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
data-formats.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
dependencymatcher.mdx docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531) 2023-04-17 13:14:01 +02:00
dependencyparser.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
doc.mdx Backslash fixes in docs (#12213) 2023-02-01 10:15:38 +01:00
docbin.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
edittreelemmatizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
entitylinker.mdx Fix new tags in docs for v3.5.x (#12629) 2023-05-15 12:06:58 +02:00
entityrecognizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
entityruler.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
example.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
index.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
inmemorylookupkb.mdx Update inmemorylookupkb.mdx (#12586) 2023-05-02 12:51:13 +02:00
kb.mdx API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128) 2023-01-19 13:29:17 +01:00
language.mdx Typo fix in Language.replace_listeners docs (#12823) 2023-07-14 09:45:54 +02:00
large-language-models.mdx Add spacy-llm docs to website (#12782) 2023-07-24 14:44:47 +02:00
legacy.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
lemmatizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
lexeme.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
lookups.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
matcher.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
morphologizer.mdx Tagger label smoothing (#12293) 2023-03-22 12:17:56 +01:00
morphology.mdx Fix new tags in docs for v3.5.x (#12629) 2023-05-15 12:06:58 +02:00
phrasematcher.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
pipe.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
pipeline-functions.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
scorer.mdx Add scorer option to return per-component scores (#12540) 2023-05-12 15:36:54 +02:00
sentencerecognizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
sentencizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
span-resolver.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
span.mdx Add span_id to Span.char_span, update Doc/Span.char_span docs (#12196) 2023-01-27 15:09:17 +01:00
spancategorizer.mdx SpanCat: Remove invalid threshold config argument (#12860) 2023-07-26 13:56:31 +02:00
spanfinder.mdx Update max_length default in span finder docs (#12803) 2023-07-07 10:17:41 +02:00
spangroup.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
spanruler.mdx fix (#12881) 2023-08-03 08:37:43 +02:00
stringstore.mdx Add info to stringstore and vocab (#12471) 2023-03-27 13:15:14 +02:00
tagger.mdx Tagger label smoothing (#12293) 2023-03-22 12:17:56 +01:00
textcategorizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
tok2vec.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
token.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
tokenizer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
top-level.mdx Docs: clarify abstract spacy.load examples (#12889) 2023-08-16 17:28:34 +02:00
transformer.mdx Website migration from Gatsby to Next (#12058) 2023-01-11 17:30:07 +01:00
vectors.mdx Support custom token/lexeme attribute for vectors (#12625) 2023-06-28 09:43:14 +02:00
vocab.mdx Add info to stringstore and vocab (#12471) 2023-03-27 13:15:14 +02:00