mirror of https://github.com/explosion/spaCy.git synced 2025-07-10 00:02:19 +03:00

History

Connor Brinton 6dd56868de 📝 Fix formula for receptive field in docs (#12918 ) SpaCy's HashEmbedCNN layer performs convolutions over tokens to produce contextualized embeddings using a `MaxoutWindowEncoder` layer. These convolutions are implemented using Thinc's `expand_window` layer, which concatenates `window_size` neighboring sequence items on either side of the sequence item being processed. This is repeated across `depth` convolutional layers. For example, consider the sequence "ABCDE" and a `MaxoutWindowEncoder` layer with a context window of 1 and a depth of 2. We'll focus on the token "C". We can visually represent the contextual embedding produced for "C" as: ```mermaid flowchart LR A0(A<sub>0</sub>) B0(B<sub>0</sub>) C0(C<sub>0</sub>) D0(D<sub>0</sub>) E0(E<sub>0</sub>) B1(B<sub>1</sub>) C1(C<sub>1</sub>) D1(D<sub>1</sub>) C2(C<sub>2</sub>) A0 --> B1 B0 --> B1 C0 --> B1 B0 --> C1 C0 --> C1 D0 --> C1 C0 --> D1 D0 --> D1 E0 --> D1 B1 --> C2 C1 --> C2 D1 --> C2 ``` Described in words, this graph shows that before the first layer of the convolution, the "receptive field" centered at each token consists only of that same token. That is to say, that we have a receptive field of 1. The first layer of the convolution adds one neighboring token on either side to the receptive field. Since this is done on both sides, the receptive field increases by 2, giving the first layer a receptive field of 3. The second layer of the convolutions adds an _additional_ neighboring token on either side to the receptive field, giving a final receptive field of 5. However, this doesn't match the formula currently given in the docs, which read: > The receptive field of the CNN will be > `depth * (window_size * 2 + 1)`, so a 4-layer network with a window > size of `2` will be sensitive to 20 words at a time. Substituting in our depth of 2 and window size of 1, this formula gives us a receptive field of: ``` depth * (window_size * 2 + 1) = 2 * (1 * 2 + 1) = 2 * (2 + 1) = 2 * 3 = 6 ``` This not only doesn't match our computations from above, it's also an even number! This is suspicious, since the receptive field is supposed to be centered on a token, and not between tokens. Generally, this formula results in an even number for any even value of `depth`. The error in this formula is that the adjustment for the center token is multiplied by the depth, when it should occur only once. The corrected formula, `depth * window_size * 2 + 1`, gives the correct value for our small example from above: ``` depth * window_size * 2 + 1 = 2 * 1 * 2 + 1 = 4 + 1 = 5 ``` These changes update the docs to correct the receptive field formula and the example receptive field size.		2023-08-21 10:52:32 +02:00
..
.vscode	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
docs	📝 Fix formula for receptive field in docs (#12918 )	2023-08-21 10:52:32 +02:00
meta	Update universe.json (#12904 )	2023-08-14 16:44:14 +02:00
pages	Add spacy-llm docs to website (#12782 )	2023-07-24 14:44:47 +02:00
plugins	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
public	Add spaCy VSCode extension materials (#12592 )	2023-05-19 14:38:53 +02:00
setup	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
src	Update CuPy extras (#12890 )	2023-08-08 12:58:28 +02:00
.dockerignore	Update Dockerfile to work with Next.js (#12119 )	2023-01-18 18:15:47 +01:00
.eslintrc	Tidy up website and add eslint config [ci skip]	2019-03-12 15:21:58 +01:00
.eslintrc.json	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
.gitignore	Move all website gitignore settings to website/.gitignore (#12120 )	2023-01-18 21:46:19 +01:00
.nvmrc	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
.prettierignore	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
.prettierrc	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
Dockerfile	Update Dockerfile to work with Next.js (#12119 )	2023-01-18 18:15:47 +01:00
netlify.toml	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
next-sitemap.config.mjs	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
next.config.mjs	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
package-lock.json	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
package.json	Make sure to run Python setup before NPM dev mode (#12384 )	2023-03-08 11:59:10 +01:00
README.md	Update Dockerfile to work with Next.js (#12119 )	2023-01-18 18:15:47 +01:00
runtime.txt	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
tsconfig.json	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00
UNIVERSE.md	Website migration from Gatsby to Next (#12058 )	2023-01-11 17:30:07 +01:00

README.md

spacy.io website and docs

The styleguide for the spaCy website is available at spacy.io/styleguide.

Setup and installation

# Clone the repository
git clone https://github.com/explosion/spaCy
cd spaCy/website

# Switch to the correct Node version
#
# If you don't have NVM and don't want to use it, you can manually switch to the Node version
# stated in /.nvmrc and skip this step
nvm use

# Install the dependencies
npm install

# Start the development server
npm run dev

If you are planning on making edits to the site, you should also set up the Prettier code formatter. It takes care of formatting Markdown and other files automatically. See here for the available extensions for your code editor. The .prettierrc file in the root defines the settings used in this codebase.

Building & developing the site with Docker

While it shouldn't be necessary and is not recommended you can run this site in a Docker container.

If you'd like to do this, be sure you do not include your local node_modules folder, since there are some dependencies that need to be built for the image system. Rename it before using.

First build the Docker image. This only needs to be done on the first run or when changes are made to Dockerfile or the website dependencies:

docker build -t spacy-io .

You can then build and run the website with:

docker run -it \
  --rm \
  -v $(pwd):/home/node/website \
  -p 3000:3000 \
  spacy-io \
  npm run dev -- -H 0.0.0.0

This will allow you to access the built website at http://0.0.0.0:3000/ in your browser, and still edit code in your editor while having the site reflect those changes.

Project structure

├── docs                 # the actual markdown content
├── meta                 # JSON-formatted site metadata
|   ├── dynamicMeta.js   # At build time generated meta data
|   ├── languages.json   # supported languages and statistical models
|   ├── sidebars.json    # sidebar navigations for different sections
|   ├── site.json        # general site metadata
|   ├── type-annotations.json # Type annotations
|   └── universe.json    # data for the spaCy universe section
├── pages                # Next router pages
├── public               # static images and other assets
├── setup                # Jinja setup
├── src                  # source
|   ├── components       # React components
|   ├── fonts            # webfonts
|   ├── images           # images used in the layout
|   ├── plugins          # custom plugins to transform Markdown
|   ├── styles           # CSS modules and global styles
|   ├── templates        # page layouts
|   |   ├── docs.js      # layout template for documentation pages
|   |   ├── index.js     # global layout template
|   |   ├── models.js    # layout template for model pages
|   |   └── universe.js  # layout templates for universe
|   └── widgets          # non-reusable components with content, e.g. changelog
├── .eslintrc.json       # ESLint config file
├── .nvmrc               # NVM config file
|                        # (to support "nvm use" to switch to correct Node version)
|
├── .prettierrc          # Prettier config file
├── next.config.mjs      # Next config file
├── package.json         # package settings and dependencies
└── tsconfig.json        # TypeScript config file