Commit Graph

3141 Commits

Author SHA1 Message Date
Ines Montani
65e7bd54f5 Update usage sidebar and nav alert [ci skip] 2023-10-06 14:36:37 +02:00
Ines Montani
b83f1e3724
Inline displaCy visualizations in docs (#13050) [ci skip] 2023-10-06 14:22:43 +02:00
Raphael Mitsch
be29216fe2
Merge pull request #13044 from explosion/docs/llm_main
Sync `master` with `docs/llm_main`
2023-10-05 16:10:19 +02:00
Raphael Mitsch
1162fcf099
Add Mistral mentions. (#13037) 2023-10-05 14:44:38 +02:00
Raphael Mitsch
862f8254e8
Add docs on Azure OpenAI support in spacy-llm (#13043)
* Add gpt-3.5-turbo-instruct to list of supported OpenAI models.

* Update `spacy-llm` task argument docs w.r.t. task refactoring (#12995)

* Update task arguments w.r.t. task refactoring in 0.5.0.

* Add disclaimer w.r.t. gated models/Llama 2.

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx

* Update docs w.r.t. PaLM support. (#13018)

* Add info on spacy.Azure.v1.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Format.
2023-10-05 13:18:27 +02:00
Raphael Mitsch
1dec138e61
Update docs w.r.t. PaLM support. (#13018) 2023-10-05 08:50:41 +02:00
Adriane Boyd
6e54360a3d
Remove pathy dependency, update docs for cloudpathlib in Weasel (#13035) 2023-10-05 08:50:22 +02:00
Raphael Mitsch
734826db79
Update spacy-llm task argument docs w.r.t. task refactoring (#12995)
* Update task arguments w.r.t. task refactoring in 0.5.0.

* Add disclaimer w.r.t. gated models/Llama 2.

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx
2023-10-05 08:45:25 +02:00
Adriane Boyd
92ce32aa3f
Update binder version to v3.7 (#13034) 2023-10-02 12:53:46 +02:00
Adriane Boyd
160e61772e
Docs for v3.7.0 (#13029)
* Docs for v3.7.0

* Minor fixes

* Extend Weasel notes

* Minor edits

* Update version in README
2023-10-01 21:40:07 +02:00
Adriane Boyd
406794a081 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
Sergiu Nisioi
6255e38695
Adding rolegal model to the spaCy universe (#13017)
* adding rolegal model to the spaCy universe

* Fix formatting

* Use raw URL

* update image url and example

* fix pip and update url to raw

* okay, let's add thumb instead of image 🐙

* Update website/meta/universe.json

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-28 11:06:50 +02:00
Madeesh Kannan
b4501db6f8
Update emoji library in rule-based matcher example (#13014) 2023-09-25 18:20:30 +02:00
Adriane Boyd
ff4215f1c7
Drop support for python 3.6 (#13009)
* Drop support for python 3.6

* Update docs
2023-09-25 14:48:38 +02:00
Adriane Boyd
935a5455b6
Docs: add new tag for evaluate CLI --spans-keys (#13013) 2023-09-25 11:49:28 +02:00
Eliana Vornov
4e3360ad12
add --spans-key option for CLI spancat evaluation (#12981)
* add span key option for CLI evaluation

* Rephrase CLI help to refer to Doc.spans instead of spancat

* Rephrase docs to refer to Doc.spans instead of spancat

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-25 11:25:41 +02:00
Raphael Mitsch
bef9f63e13 Add gpt-3.5-turbo-instruct to list of supported OpenAI models. 2023-09-21 11:28:58 +02:00
Sofie Van Landeghem
8f0d6b0a8c
Fix in BertTokenizer docs (#12955)
* fix BertWordPieceTokenizer constructor call

* fix

* Update website/docs/usage/linguistic-features.mdx

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-13 13:21:58 +02:00
Sofie Van Landeghem
013762be41
Few spacy-llm doc fixes (#12969)
* fix construction example

* shorten task-specific factory list

* small edits to HF models

* small edit to API models

* typo

* fix space

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-09-08 11:35:38 +02:00
Sofie Van Landeghem
def7013eec
Docs for spacy-llm 0.5.0 (#12968)
* Update incorrect example config. (#12893)

* spacy-llm docs cleanup (#12945)

* Shorten NER section

* fix template references

* simplify sections

* set temperature to 0.0 in examples

* condense model information

* fix parameters for REST models

* set temperature to 0.0

* spelling fix

* trigger preview

* fix quotes

* add small note on noop.v1

* move up example noop config

* set appropriate model example configs

* explain config

* fix

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* Docs for ner.v3 and spancat.v3 spacy-llm tasks (#12949)

* formatting

* update usage table with NER.v3

* fix typo in links

* v3 overview of parameters

* add spancat.v3

* add further v3 explanations

* remove TODO comment

* few more small fixes

* Add doc section on LLM + task factories (#12905)

* Add section on LLM + task factories.

* Apply suggestions from code review

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* add default config to openai models (#12961)

* Docs for spacy-llm 0.5.0 (#12967)

* simplify Python example

* simplify Python example

* Refer only to latest OpenAI model versions from usage doc

* Typo fix

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* clarify accuracy claim

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-09-08 10:25:14 +02:00
Magdalena Aniol
cc78847688
fix training.batch_size example (#12963) 2023-09-06 16:38:13 +02:00
Sofie Van Landeghem
6d1f6d9a23
Fix LLM usage example (#12950)
* fix usage example

* revert back to v2 to allow hot fix on main
2023-09-04 09:05:50 +02:00
Sofie Van Landeghem
5c1f9264c2
fix typo in link (#12948)
* fix typo in link

* fix REL.v1 parameter
2023-09-01 13:47:20 +02:00
David Berenstein
065ead4eed
updated add_pipe docs (#12947) 2023-09-01 11:05:36 +02:00
vincent d warmerdam
3e4264899c
Update large-language-models.mdx (#12944) 2023-08-30 11:58:14 +02:00
Ines Montani
52758e1afa Add headers to netlify.toml [ci skip] 2023-08-30 11:55:23 +02:00
Vinit Ravishankar
c2303858e6
Documentation for spacy-curated-transformers (#12677)
* initial

* initial documentation run

* fix typo

* Remove mentions of Torchscript and quantization

Both are disabled in the initial release of `spacy-curated-transformers`.

* Fix `piece_encoder` entries

* Remove `spacy-transformers`-specific warning

* Fix duplicate entries in tables

* Doc fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Remove type aliases

* Fix copy-paste typo

* Change `debug pieces` version tag to `3.7`

* Set curated transformers API version to  `3.7`

* Fix transformer listener naming

* Add docs for `init fill-config-transformer`

* Update CLI command invocation syntax

* Update intro section of the pipeline component docs

* Fix source URL

* Add a note to the architectures section about the `init fill-config-transformer` CLI command

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update CLI command name, args

* Remove hyphen from the `curated-transformers.mdx` filename

* Fix links

* Remove placeholder text

* Add text to the model/tokenizer loader sections

* Fill in the `DocTransformerOutput` section

* Formatting fixes

* Add curated transformer page to API docs sidebar

* More formatting fixes

* Remove TODO comment

* Remove outdated info about default config

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add link to HF model hub

* `prettier`

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-08-29 17:52:16 +02:00
PD Hall
d8a32c1050
docs: fix ngram_range_suggester max_size description (#12939) 2023-08-29 11:10:58 +02:00
Connor Brinton
6dd56868de
📝 Fix formula for receptive field in docs (#12918)
SpaCy's HashEmbedCNN layer performs convolutions over tokens to produce
contextualized embeddings using a `MaxoutWindowEncoder` layer. These
convolutions are implemented using Thinc's `expand_window` layer, which
concatenates `window_size` neighboring sequence items on either side of
the sequence item being processed. This is repeated across `depth`
convolutional layers.

For example, consider the sequence "ABCDE" and a `MaxoutWindowEncoder`
layer with a context window of 1 and a depth of 2. We'll focus on the
token "C". We can visually represent the contextual embedding produced
for "C" as:
```mermaid
flowchart LR
A0(A<sub>0</sub>)
B0(B<sub>0</sub>)
C0(C<sub>0</sub>)
D0(D<sub>0</sub>)
E0(E<sub>0</sub>)
B1(B<sub>1</sub>)
C1(C<sub>1</sub>)
D1(D<sub>1</sub>)
C2(C<sub>2</sub>)
A0 --> B1
B0 --> B1
C0 --> B1
B0 --> C1
C0 --> C1
D0 --> C1
C0 --> D1
D0 --> D1
E0 --> D1
B1 --> C2
C1 --> C2
D1 --> C2
```

Described in words, this graph shows that before the first layer of the
convolution, the "receptive field" centered at each token consists only
of that same token. That is to say, that we have a receptive field of 1.
The first layer of the convolution adds one neighboring token on either
side to the receptive field. Since this is done on both sides, the
receptive field increases by 2, giving the first layer a receptive field
of 3. The second layer of the convolutions adds an _additional_
neighboring token on either side to the receptive field, giving a final
receptive field of 5.

However, this doesn't match the formula currently given in the docs,
which read:
> The receptive field of the CNN will be
> `depth * (window_size * 2 + 1)`, so a 4-layer network with a window
> size of `2` will be sensitive to 20 words at a time.

Substituting in our depth of 2 and window size of 1, this formula gives
us a receptive field of:
```
depth * (window_size * 2 + 1)
= 2 * (1 * 2 + 1)
= 2 * (2 + 1)
= 2 * 3
= 6
```

This not only doesn't match our computations from above, it's also an
even number! This is suspicious, since the receptive field is supposed
to be centered on a token, and not between tokens. Generally, this
formula results in an even number for any even value of `depth`.

The error in this formula is that the adjustment for the center token
is multiplied by the depth, when it should occur only once. The
corrected formula, `depth * window_size * 2 + 1`, gives the correct
value for our small example from above:
```
depth * window_size * 2 + 1
= 2 * 1 * 2 + 1
= 4 + 1
= 5
```

These changes update the docs to correct the receptive field formula and
the example receptive field size.
2023-08-21 10:52:32 +02:00
Adriane Boyd
76a9f9c6c6
Docs: clarify abstract spacy.load examples (#12889) 2023-08-16 17:28:34 +02:00
William Mattingly
64b8ee2dbe
Update universe.json (#12904)
* Update universe.json

added hobbit-spacy to the universe json

* Update universe.json

removed displacy from hobbit-spacy and added a default text.
2023-08-14 16:44:14 +02:00
Adriane Boyd
c4e378df97
Update CuPy extras (#12890)
* Add `cuda12x` for `cupy-cuda12x`.
* Drop `cuda-autodetect` from quickstart, set default to `cuda11x`
instead.
2023-08-08 12:58:28 +02:00
Sofie Van Landeghem
3b7faf4f5e
fix (#12881) 2023-08-03 08:37:43 +02:00
Arman Mohammadi
07407e07ab
fix the regular expression matching on the full text (#12883)
There was a mistake in the regex pattern which caused not matching all the desired tokens. The problem was that when we use r string literal prefix to suppose a raw text, we should not use two backslashes to demonstrate a backslash.
2023-08-02 16:52:26 +02:00
Adriane Boyd
0fe43f40f1
Support registered vectors (#12492)
* Support registered vectors

* Format

* Auto-fill [nlp] on load from config and from bytes/disk

* Only auto-fill [nlp]

* Undo all changes to Language.from_disk

* Expand BaseVectors

These methods are needed in various places for training and vector
similarity.

* isort

* More linting

* Only fill [nlp.vectors]

* Update spacy/vocab.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Revert changes to test related to auto-filling [nlp]

* Add vectors registry

* Rephrase error about vocab methods for vectors

* Switch to dummy implementation for BaseVectors.to_ops

* Add initial draft of docs

* Remove example from BaseVectors docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/basevectors.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix type and lint bpemb example

* Update website/docs/api/basevectors.mdx

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-08-01 15:46:08 +02:00
Andy Friedman
186889ec9c
added entry for SaysWho (#12828)
* Update universe.json

added entry for Sayswho

* Update universe.json

updated sayswho entry

* Update universe.json

* Update website/meta/universe.json

* Update website/meta/universe.json

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-07-31 10:52:32 +02:00
Sofie Van Landeghem
c9e9dccf79
Add displaCy data structures to docs (2) (#12875)
* Add data structures to docs

* Adjusted descriptions for more consistency

* Add _optional_ flag to parameters

* Add tests and adjust optional title key in doc

* Add title to dep visualizations

* fix typo

---------

Co-authored-by: thomashacker <EdwardSchmuhl@web.de>
2023-07-31 10:47:57 +02:00
Victoria
49055ed7c8
Add cli for finding locations of registered func (#12757)
* Add cli for finding locations of registered func

* fixes: naming and typing

* isort

* update naming

* remove to find-function

* remove file:// bit

* use registry name if given and exit gracefully if a registry was not found

* clean up failure msg

* specify registry_name options

* mypy fixes

* return location for internal usage

* add documentation

* more mypy fixes

* clean up example

* add section to menu

* add tests

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2023-07-31 09:39:00 +02:00
Márton Kardos
51b9655470
Added OdyCy to spaCy Universe (#12826)
* Added OdyCy to spaCy Universe

* Replaced template tags

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-07-26 16:05:53 +02:00
Madeesh Kannan
98799d849e
SpanCat: Remove invalid threshold config argument (#12860) 2023-07-26 13:56:31 +02:00
Victoria
e2b89012a2
Add spacy-llm docs to website (#12782)
* initial commit

* update for v0.4.0

* Apply suggestions from code review

* Fix formatting

* Apply suggestions from code review

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx

* update usage page

* Apply suggestions from review

* Apply suggestions from review

* fix links

* fix relative links

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply suggestions from review

* Add section on Llama 2. Format.

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-07-24 14:44:47 +02:00
Adriane Boyd
95075298f5
Update pex Makefile defaults (#12832)
* Update pex Makefile defaults

- switch to python 3.8
- only install spacy-lookups-data for extra packages

* Update website for pex defaults
2023-07-18 09:29:04 +02:00
Ian Thompson
ef20e114e0
Typo fix in Language.replace_listeners docs (#12823)
* modified:   spacy/language.py
	- corrected typo in docstring for :method:`Language.replace_listeners`
	- added noqa comment on unused local variable assignment in :method:`Language.from_config` as I wasn't sure if it should be unassigned

modified:   website/docs/api/language.mdx
	- corrected typo in `Language.replace_listeners` markdown

* modified:   spacy/language.py
	- removed noqa comment

---------

Co-authored-by: Ian Thompson <ian.thompson@hrblock.com>
2023-07-14 09:45:54 +02:00
Sofie Van Landeghem
ddffd09602
Trainable lemmatizer docs link (#12795)
* add an anchor to the trainable lemmatizer section

* add requirement for morphologizer,tagger to rule-based lemmatizer

* morphologizer only
2023-07-07 15:18:16 +02:00
Adriane Boyd
1a55661cfb
Update website binder version to v3.6 (#12805) 2023-07-07 10:52:33 +02:00
Adriane Boyd
41dba5bd34
Update max_length default in span finder docs (#12803) 2023-07-07 10:17:41 +02:00
Adriane Boyd
4e19ec7eb8
Docs for v3.6.0 (#12792)
* Docs for v3.6.0

* Add sl performance

* Add da trf note
2023-07-06 12:58:25 +02:00
Tom Aarsen
eab929361d
Use 'exclude' instead of 'disable' (#12783)
as suggested by @svlandeg
2023-07-04 11:45:13 +02:00
Marcus Blättermann
bd239511a4
Fix problem with missing syntax highlighting languages causing runtime crash on the website (#12781)
* Fix problem with universe pages using `docker` language

* Fix problem with universe pages using `r` language

* Add fallback, in case code language is unknown
2023-07-03 10:24:25 +02:00
Daniël de Kok
57a230c6e4
Remove section about parallel training with Ray (#12770)
The Ray integration is currently broken, having these docs around
suggest that this functionality is currently available.
2023-06-28 17:09:57 +02:00