53687b5bca
* [wip] Update * [wip] Update * Add initial port * [wip] Update * Fix all imports * Add spancat_exclusive to pipeline * [WIP] Update * [ci skip] Add breakpoint for debugging * Use spacy.SpanCategorizer.v1 as default archi * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * [ci skip] Small updates * Use Softmax v2 directly from thinc * Cache the label map * Fix mypy errors However, I ignored line 370 because it opened up a bunch of type errors that might be trickier to solve and might lead to a more complicated codebase. * avoid multiplication with 1.0 Co-authored-by: kadarakos <kadar.akos@gmail.com> * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update component versions to v2 * Add scorer to docstring * Add _n_labels property to SpanCategorizer Instead of using len(self.labels) in initialize() I am using a private property self._n_labels. This achieves implementation parity and allows me to delete the whole initialize() method for spancat_exclusive (since it's now the same with spancat). * Inherit from SpanCat instead of TrainablePipe This commit changes the inheritance structure of Exclusive_Spancat, now it's inheriting from SpanCategorizer than TrainablePipe. This allows me to remove duplicate methods that are already present in the parent function. * Revert documentation link to spancat * Fix init call for exclusive spancat * Update spacy/pipeline/spancat_exclusive.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Import Suggester from spancat * Include zero_init.v1 for spancat * Implement _allow_extra_label to use _n_labels To ensure that spancat / spancat_exclusive cannot be resized after initialization, I inherited the _allow_extra_label() method from spacy/pipeline/trainable_pipe.pyx and used self._n_labels instead of len(self.labels) for checking. I think that changing it locally is a better solution rather than forcing each class that inherits TrainablePipe to use the self._n_labels attribute. Also note that I turned-off black formatting in this block of code because it reads better without the overhang. * Extend existing tests to spancat_exclusive In this commit, I extended the existing tests for spancat to include spancat_exclusive. I parametrized the test functions with 'name' (similar var name with textcat and textcat_multilabel) for each applicable test. TODO: Add overfitting tests for spancat_exclusive * Update documentation for spancat * Turn on formatting for allow_extra_label * Remove initializers in default config * Use DEFAULT_EXCL_SPANCAT_MODEL I also renamed spancat_exclusive_default_config into spancat_excl_default_config because black does some not pretty formatting changes. * Update documentation Update grammar and usage Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clarify docstring for Exclusive_SpanCategorizer * Remove mypy ignore and typecast labels to list * Fix documentation API * Use a single variable for tests * Update defaults for number of rows Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Put back initializers in spancat config Whenever I remove model.scorer.init_w and model.scorer.init_b, I encounter an error in the test: SystemError: <method '__getitem__' of 'dict' objects> returned a result with an error set. My Thinc version is 8.1.5, but I can't seem to check what's causing the error. * Update spancat_exclusive docstring * Remove init_W and init_B parameters This commit is expected to fail until the new Thinc release. * Require thinc>=8.1.6 for serializable Softmax defaults * Handle zero suggestions to make tests pass I'm not sure if this is the most elegant solution. But what should happen is that the _make_span_group function MUST return an empty SpanGroup if there are no suggestions. The error happens when the 'scores' variable is empty. We cannot get the 'predicted' and other downstream vars. * Better approach for handling zero suggestions * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spancategorizer headers * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Add default value in negative_weight in docs * Add default value in allow_overlap in docs * Update how spancat_exclusive is constructed In this commit, I added the following: - Put the default values of negative_weight and allow_overlap in the default_config dictionary. - Rename make_spancat -> make_exclusive_spancat * Run prettier on spancategorizer.mdx * Change exactly one -> at most one * Add suggester documentation in Exclusive_SpanCategorizer * Add suggester to spancat docstrings * merge multilabel and singlelabel spancat * rename spancat_exclusive to singlelable * wire up different make_spangroups for single and multilabel * black * black * add docstrings * more docstring and fix negative_label * don't rely on default arguments * black * remove spancat exclusive * replace single_label with add_negative_label and adjust inference * mypy * logical bug in configuration check * add spans.attrs[scores] * single label make_spangroup test * bugfix * black * tests for make_span_group with negative labels * refactor make_span_group * black * Update spacy/tests/pipeline/test_spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * remove duplicate declaration * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * raise error instead of just print * make label mapper private * update docs * run prettier * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * don't keep recomputing self._label_map for each span * typo in docs * Intervals to private and document 'name' param * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * add Tag to new features * replace tags * revert * revert * revert * revert * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.mdx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * prettier * Fix merge * Update website/docs/api/spancategorizer.mdx * remove references to 'single_label' * remove old paragraph * Add spancat_singlelabel to config template * Format * Extend init config tests --------- Co-authored-by: kadarakos <kadar.akos@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> |
||
---|---|---|
.. | ||
.vscode | ||
docs | ||
meta | ||
pages | ||
plugins | ||
public | ||
setup | ||
src | ||
.dockerignore | ||
.eslintrc | ||
.eslintrc.json | ||
.gitignore | ||
.nvmrc | ||
.prettierignore | ||
.prettierrc | ||
Dockerfile | ||
netlify.toml | ||
next-sitemap.config.mjs | ||
next.config.mjs | ||
package-lock.json | ||
package.json | ||
README.md | ||
runtime.txt | ||
tsconfig.json | ||
UNIVERSE.md |
spacy.io website and docs
The styleguide for the spaCy website is available at spacy.io/styleguide.
Setup and installation
# Clone the repository
git clone https://github.com/explosion/spaCy
cd spaCy/website
# Switch to the correct Node version
#
# If you don't have NVM and don't want to use it, you can manually switch to the Node version
# stated in /.nvmrc and skip this step
nvm use
# Install the dependencies
npm install
# Start the development server
npm run dev
If you are planning on making edits to the site, you should also set up the
Prettier code formatter. It takes care of formatting
Markdown and other files automatically.
See here for the available
extensions for your code editor. The
.prettierrc
file in the root defines the settings used in this codebase.
Building & developing the site with Docker
While it shouldn't be necessary and is not recommended you can run this site in a Docker container.
If you'd like to do this, be sure you do not include your local
node_modules
folder, since there are some dependencies that need to be built
for the image system. Rename it before using.
First build the Docker image. This only needs to be done on the first run
or when changes are made to Dockerfile
or the website dependencies:
docker build -t spacy-io .
You can then build and run the website with:
docker run -it \
--rm \
-v $(pwd):/home/node/website \
-p 3000:3000 \
spacy-io \
npm run dev -- -H 0.0.0.0
This will allow you to access the built website at http://0.0.0.0:3000/ in your browser, and still edit code in your editor while having the site reflect those changes.
Project structure
├── docs # the actual markdown content
├── meta # JSON-formatted site metadata
| ├── dynamicMeta.js # At build time generated meta data
| ├── languages.json # supported languages and statistical models
| ├── sidebars.json # sidebar navigations for different sections
| ├── site.json # general site metadata
| ├── type-annotations.json # Type annotations
| └── universe.json # data for the spaCy universe section
├── pages # Next router pages
├── public # static images and other assets
├── setup # Jinja setup
├── src # source
| ├── components # React components
| ├── fonts # webfonts
| ├── images # images used in the layout
| ├── plugins # custom plugins to transform Markdown
| ├── styles # CSS modules and global styles
| ├── templates # page layouts
| | ├── docs.js # layout template for documentation pages
| | ├── index.js # global layout template
| | ├── models.js # layout template for model pages
| | └── universe.js # layout templates for universe
| └── widgets # non-reusable components with content, e.g. changelog
├── .eslintrc.json # ESLint config file
├── .nvmrc # NVM config file
| # (to support "nvm use" to switch to correct Node version)
|
├── .prettierrc # Prettier config file
├── next.config.mjs # Next config file
├── package.json # package settings and dependencies
└── tsconfig.json # TypeScript config file