Commit Graph

411 Commits

Author SHA1 Message Date
Adriane Boyd
0fe43f40f1
Support registered vectors (#12492)
* Support registered vectors

* Format

* Auto-fill [nlp] on load from config and from bytes/disk

* Only auto-fill [nlp]

* Undo all changes to Language.from_disk

* Expand BaseVectors

These methods are needed in various places for training and vector
similarity.

* isort

* More linting

* Only fill [nlp.vectors]

* Update spacy/vocab.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Revert changes to test related to auto-filling [nlp]

* Add vectors registry

* Rephrase error about vocab methods for vectors

* Switch to dummy implementation for BaseVectors.to_ops

* Add initial draft of docs

* Remove example from BaseVectors docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/basevectors.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix type and lint bpemb example

* Update website/docs/api/basevectors.mdx

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-08-01 15:46:08 +02:00
Adriane Boyd
1a55661cfb
Update website binder version to v3.6 (#12805) 2023-07-07 10:52:33 +02:00
Adriane Boyd
4e19ec7eb8
Docs for v3.6.0 (#12792)
* Docs for v3.6.0

* Add sl performance

* Add da trf note
2023-07-06 12:58:25 +02:00
Tom Aarsen
eab929361d
Use 'exclude' instead of 'disable' (#12783)
as suggested by @svlandeg
2023-07-04 11:45:13 +02:00
Tom Aarsen
93983f08fc
Add SpanMarker for NER to spaCy universe (#12730)
* Add SpanMarker for NER to spaCy universe

* Escape the newlines in the text in the code example

Or at least, attempt to

* Remove now unnecessary import

* Disable NER pipeline component in code example
2023-06-20 16:47:44 +02:00
David Berenstein
53c400bd7a
docs: added reference to spacy-setfit to the spaCy Universe (#12737)
* docs: added reference to spacy-setfit

* removed package import after adding factory entry points to packages
2023-06-19 15:52:07 +02:00
Jacobo Myerston
daa6e0339f
Update universe.json (#12709)
* Update universe.json

* Update universe.json

add some missing commas in the greCy's description.
2023-06-12 13:55:20 +02:00
kadarakos
c003aac29a
SpanFinder into spaCy from experimental (#12507)
* span finder integrated into spacy from experimental

* black

* isort

* black

* default spankey constant

* black

* Update spacy/pipeline/spancat.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* rename

* rename

* max_length and min_length as Optional[int] and strict checking

* black

* mypy fix for integer type infinity

* revert line order

* implement all comparison operators for inf int

* avoid two for loops over all docs by not precomputing

* interleave thresholding with span creation

* black

* revert to not interleaving (relized its faster)

* black

* Update spacy/errors.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* update dosctring

* enforce that the gold and predicted documents have the same text

* new error for ensuring reference and predicted texts are the same

* remove todo

* adjust test

* black

* handle misaligned tokenization

* return correct variable

* failing overfit test

* only use a single spans_key like in spancat

* black

* remove debug lines

* typo

* remove comment

* remove near duplicate reduntant method

* use the 'spans_key' variable name everywhere

* Update spacy/pipeline/span_finder.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* flaky test fix suggestion, hand set bias terms

* only test suggester and test result exhaustively

* make it clear that the span_finder_suggester is more general (not specific to span_finder)

* Update spacy/tests/pipeline/test_span_finder.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Apply suggestions from code review

* remove question comment

* move preset_spans_suggester test to spancat tests

* Add docs and unify default configs for spancat and span finder

* Add `allow_overlap=True` to span finder scorer

* Fix offset bug in set_annotations

* Ignore labels in span finder scorer

* Format

* Add span_finder to quickstart template

* Move settings to self.cfg, store min/max unset as None

* Remove debugging

* Update docstrings and docs

* Update spacy/pipeline/span_finder.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix imports

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-06-07 15:52:28 +02:00
Isabel Zimmerman
05df59fd4a
[DOCS] add vetiver to spacy universe (#12557)
* add vetiver to spacy universe

* remove image

* update logo to render correctly in thumbnail

* apply Basil's suggestion

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* refer to the same model

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-06-01 17:11:18 +02:00
Vinit Ravishankar
f0e0206b77
update universe for spacypdfreader (#12661) 2023-05-23 13:28:48 +02:00
Victoria
6930a6bf45
Add spaCy VSCode extension materials (#12592) 2023-05-19 14:38:53 +02:00
Adriane Boyd
df083f91a5
Add Malay to website languages (#12643) 2023-05-17 13:13:43 +02:00
David Berenstein
83b6f488cb
universe: Update examples Adept Augementation (#12620)
* Update universe.json

* chore: changed readme example as suggested by Vincent Warmerdam (koaning)
2023-05-15 14:09:33 +02:00
royashcenazi
3252f6b13f
Parsigs universe 3 (#12617)
* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

* added biomedical category

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-10 13:49:51 +02:00
royashcenazi
a56ab98e3c
parsigs universe (#12616)
* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-10 13:19:28 +02:00
David Berenstein
d11b549195
chore: added adept-augmentations to the spacy universe (#12609)
* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-10 13:16:16 +02:00
Patrick J. Burns
15f16db6ca
Fix typo (#12615) 2023-05-09 15:52:34 +02:00
Patrick J. Burns
eb3960a15a
Add LatinCy models to universe.json (#12597)
* Add LatinCy models to universe.json

* Update website/meta/universe.json

Add install code for LatinCy models to 'code_example'

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update LatinCy ‘code_example’ in website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-09 12:02:45 +02:00
Victoria
a8dfc66135
Add spacy-wasm to universe (#12572)
* add spacy-wasm to universe

* add tag
2023-04-26 14:18:40 +02:00
moxley01
070fa16545
add spacysee project (#12568) 2023-04-25 12:30:19 +02:00
andyjessen
02259fa195
Add category to spaCy project (#12506)
ScispaCy fits within biomedical domain. Consider adding this category.
2023-04-07 15:31:04 +02:00
sloev / Johannes Valbjørn
fd072533e7
add spacy_onnx_sentiment_english to universe (#12422)
* add spacy_onnx_sentiment_english to universe

* rename to sentimental-onix

* fix comma json error

* fix typo

* typo fix

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* mention need to download model before example works

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-27 11:35:14 +02:00
Adriane Boyd
8ea15240ca
Update binder version to v3.5 (#12153) 2023-01-25 13:14:23 +01:00
Marcus Blättermann
3062fae2ca
Fix broken URL (#12176) 2023-01-25 11:42:19 +01:00
Sofie Van Landeghem
0f5d8a27f2
3.5 usage page (#12057)
* skeleton

* Fill in non-CLI details from release notes draft

* Add TODO for fuzzy matching

* Website updates for v3-5 draft

* Fill in usage examples

* Add fuzzy matching to intro

* Fix fuzzy examples

* Shell example formatting

* Fix typo

* Format

* Remove trailing periods in internal list

* Update

* Fix spacing for nested lists

* Update InMemoryLookupKB link

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2023-01-19 16:13:04 +01:00
Adriane Boyd
3b8918e166
API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128)
* API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar

* adjust to mdx

* linkout to InMemoryLookupKB at first occurrence in kb.mdx

* fix links to docs

* revert Azure trigger setting (I'll make a separate PR)

Co-authored-by: svlandeg <svlandeg@github.com>
2023-01-19 13:29:17 +01:00
Sofie Van Landeghem
554df9ef20
Website migration from Gatsby to Next (#12058)
* Rename all MDX file to `.mdx`

* Lock current node version (#11885)

* Apply Prettier (#11996)

* Minor website fixes (#11974) [ci skip]

* fix table

* Migrate to Next WEB-17 (#12005)

* Initial commit

* Run `npx create-next-app@13 next-blog`

* Install MDX packages

Following: 77b5f79a4d/packages/next-mdx/readme.md

* Add MDX to Next

* Allow Next to handle `.md` and `.mdx` files.

* Add VSCode extension recommendation

* Disabled TypeScript strict mode for now

* Add prettier

* Apply Prettier to all files

* Make sure to use correct Node version

* Add basic implementation for `MDXRemote`

* Add experimental Rust MDX parser

* Add `/public`

* Add SASS support

* Remove default pages and styling

* Convert to module

This allows to use `import/export` syntax

* Add import for custom components

* Add ability to load plugins

* Extract function

This will make the next commit easier to read

* Allow to handle directories for page creation

* Refactoring

* Allow to parse subfolders for pages

* Extract logic

* Redirect `index.mdx` to parent directory

* Disabled ESLint during builds

* Disabled typescript during build

* Remove Gatsby from `README.md`

* Rephrase Docker part of `README.md`

* Update project structure in `README.md`

* Move and rename plugins

* Update plugin for wrapping sections

* Add dependencies for  plugin

* Use  plugin

* Rename wrapper type

* Simplify unnessary adding of id to sections

The slugified section ids are useless, because they can not be referenced anywhere anyway. The navigation only works if the section has the same id as the heading.

* Add plugin for custom attributes on Markdown elements

* Add plugin to readd support for tables

* Add plugin to fix problem with wrapped images

For more details see this issue: https://github.com/mdx-js/mdx/issues/1798

* Add necessary meta data to pages

* Install necessary dependencies

* Remove outdated MDX handling

* Remove reliance on `InlineList`

* Use existing Remark components

* Remove unallowed heading

Before `h1` components where not overwritten and would never have worked and they aren't used anywhere either.

* Add missing components to MDX

* Add correct styling

* Fix broken list

* Fix broken CSS classes

* Implement layout

* Fix links

* Fix broken images

* Fix pattern image

* Fix heading attributes

* Rename heading attribute

`new` was causing some weird issue, so renaming it to `version`

* Update comment syntax in MDX

* Merge imports

* Fix markdown rendering inside components

* Add model pages

* Simplify anchors

* Fix default value for theme

* Add Universe index page

* Add Universe categories

* Add Universe projects

* Fix Next problem with copy

Next complains when the server renders something different then the client, therfor we move the differing logic to `useEffect`

* Fix improper component nesting

Next doesn't allow block elements inside a `<p>`

* Replace landing page MDX with page component

* Remove inlined iframe content

* Remove ability to inline HTML content in iFrames

* Remove MDX imports

* Fix problem with image inside link in MDX

* Escape character for MDX

* Fix unescaped characters in MDX

* Fix headings with logo

* Allow to export static HTML pages

* Add prebuild script

This command is automatically run by Next

* Replace `svg-loader` with `react-inlinesvg`

`svg-loader` is no longer maintained

* Fix ESLint `react-hooks/exhaustive-deps`

* Fix dropdowns

* Change code language from `cli` to `bash`

* Remove unnessary language `none`

* Fix invalid code language

`markdown_` with an underscore was used to basically turn of syntax highlighting, but using unknown languages know throws an error.

* Enable code blocks plugin

* Readd `InlineCode` component

MDX2 removed the `inlineCode` component

> The special component name `inlineCode` was removed, we recommend to use `pre` for the block version of code, and code for both the block and inline versions

Source: https://mdxjs.com/migrating/v2/#update-mdx-content

* Remove unused code

* Extract function to own file

* Fix code syntax highlighting

* Update syntax for code block meta data

* Remove unused prop

* Fix internal link recognition

There is a problem with regex between Node and browser, and since Next runs the component on both, this create an error.

`Prop `rel` did not match. Server: "null" Client: "noopener nofollow noreferrer"`

This simplifies the implementation and fixes the above error.

* Replace `react-helmet` with `next/head`

* Fix `className` problem for JSX component

* Fix broken bold markdown

* Convert file to `.mjs` to be used by Node process

* Add plugin to replace strings

* Fix custom table row styling

* Fix problem with `span` inside inline `code`

React doesn't allow a `span` inside an inline `code` element and throws an error in dev mode.

* Add `_document` to be able to customize `<html>` and `<body>`

* Add `lang="en"`

* Store Netlify settings in file

This way we don't need to update via Netlify UI, which can be tricky if changing build settings.

* Add sitemap

* Add Smartypants

* Add PWA support

* Add `manifest.webmanifest`

* Fix bug with anchor links after reloading

There was no need for the previous implementation, since the browser handles this nativly. Additional the manual scrolling into view was actually broken, because the heading would disappear behind the menu bar.

* Rename custom event

I was googeling for ages to find out what kind of event `inview` is, only to figure out it was a custom event with a name that sounds pretty much like a native one. 🫠

* Fix missing comment syntax highlighting

* Refactor Quickstart component

The previous implementation was hidding the irrelevant lines via data-props and dynamically generated CSS. This created problems with Next and was also hard to follow. CSS was used to do what React is supposed to handle.

The new implementation simplfy filters the list of children (React elements) via their props.

* Fix syntax highlighting for Training Quickstart

* Unify code rendering

* Improve error logging in Juniper

* Fix Juniper component

* Automatically generate "Read Next" link

* Add Plausible

* Use recent DocSearch component and adjust styling

* Fix images

* Turn of image optimization

> Image Optimization using Next.js' default loader is not compatible with `next export`.

We currently deploy to Netlify via `next export`

* Dont build pages starting with `_`

* Remove unused files

* Add Next plugin to Netlify

* Fix button layout

MDX automatically adds `p` tags around text on a new line and Prettier wants to put the text on a new line. Hacking with JSX string.

* Add 404 page

* Apply Prettier

* Update Prettier for `package.json`

Next sometimes wants to patch `package-lock.json`. The old Prettier setting indended with 4 spaces, but Next always indends with 2 spaces. Since `npm install` automatically uses the indendation from `package.json` for `package-lock.json` and to avoid the format switching back and forth, both files are now set to 2 spaces.

* Apply Next patch to `package-lock.json`

When starting the dev server Next would warn `warn  - Found lockfile missing swc dependencies, patching...` and update the `package-lock.json`. These are the patched changes.

* fix link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* small backslash fixes

* adjust to new style

Co-authored-by: Marcus Blättermann <marcus@essenmitsosse.de>
2023-01-11 17:30:07 +01:00
Wannaphong Phatthiyaphaibun
31c1beba78
Add spacy-pythainlp (#12038)
* Add spacy-pythainlp

* Move submission to right section

* Minor cleanup

* Remove extra list call

* Update universe.json

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2023-01-03 17:03:59 +09:00
vincent d warmerdam
6d2ca1ab3a
Update custom solutions links (#11903)
* Update custom solutions

Will now point to https://explosion.ai/custom-solutions

* added-sidebar

* added-analysis-to-readme

* update-landing-page
2022-12-07 16:02:09 +01:00
Paul O'Leary McCann
73919336fb
Remove spacy-sentence-segmenter from Universe (#11932) 2022-12-07 15:56:03 +01:00
Paul O'Leary McCann
916191848a
Update scattertext example code (#11937)
* Update scattertext example code

* Remove PMI Filter Threshold
2022-12-07 18:09:04 +09:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Paul O'Leary McCann
8271cfb4cd
Remove Learning Path spaCy (#11846) 2022-11-23 11:03:18 +01:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe (#11825) 2022-11-18 08:24:22 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
2022-11-14 19:58:38 +09:00
Jacobo Myerston
322b5dc1df
Add greCy to Universe (#11774)
* Update universe.json

* Update universe.json

fixes Github value
2022-11-10 13:21:20 +09:00
Adriane Boyd
8740e4341f
Update languages and version in README and website (#11694) 2022-10-25 14:54:54 +02:00
Adriane Boyd
6c380d4fc6 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5 2022-10-20 13:45:17 +02:00
Adriane Boyd
7e56701057 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5 2022-10-20 13:38:49 +02:00
Cellan Hall
b69d249a22
Adding spacy-cleaner to the spaCy universe (#11674)
* added spacy-cleaner to the spaCy universe

* Move data to righ section of universe.json

* Cleanup

- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-10-20 20:38:29 +09:00
Paul O'Leary McCann
2e52479eec
Fix example code for spacy-wordnet (#11593)
* Fix example code for spacy-wordnet

It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Cleanup

* More cleanup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-10-11 16:45:05 +02:00
svlandeg
9c8cdb403e Merge branch 'master_copy' into develop_copy 2022-09-30 15:40:26 +02:00
Gabriele Picco
ff9002b726
Add Zshot Spacy plugin (#11557)
* Add Zshot Spacy plugin

Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-29 17:34:44 +02:00
Taniguchi Yasufumi
9557b0fb01
Add spacy-partial-tagger to spaCy Universe (#11538) 2022-09-27 14:11:50 +02:00
Paul O'Leary McCann
a44b7d4622
Add experimental coref docs (#11291)
* Add experimental coref docs

* Docs cleanup

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply changes from code review

* Fix prettier formatting

It seems a period after a number made this think it was a list?

* Update docs on examples for initialize

* Add docs for coref scorers

* Remove 3.4 notes from coref

There won't be a "new" tag until it's in core.

* Add docs for span cleaner

* Fix docs

* Fix docs to match spacy-experimental

These weren't properly updated when the code was moved out of spacy
core.

* More doc fixes

* Formatting

* Update architectures

* Fix links

* Fix another link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2022-09-27 18:11:23 +09:00
Basile Dura
f40d2fac29
fix: remove duplicate v3.2 (#11530) 2022-09-23 13:18:51 +02:00
shademe
21000ae935
Merge branch 'master' into merge-master-into-develop 2022-09-06 17:50:07 +02:00
Paul O'Leary McCann
ff0522f8da Fix asent pip package name 2022-09-06 19:19:05 +09:00
Adriane Boyd
81874265e9 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1 2022-08-24 12:47:42 +02:00
Tobius Saul
c09d2fa25b
luganda language extension (#10847)
* luganda language extension

* __init__.py changes

* New enhancements

* Lexical attribute changed

* punctuaction and sentence additions

* Remove comment header

* Fix typos, reformat

* reformated version

* Add tokenizer test

* Remove contractions from stop words

* Format

* Add Luganda to website

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-08-23 13:09:36 +02:00