Commit Graph

1113 Commits

Author SHA1 Message Date
Matthew Honnibal
3ecec1324c
Usage page on memory management, explaining memory zones and doc_cleaner (#13643) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-10-23 12:42:54 +02:00
Ikko Eltociear Ashimine
15fbf5ef36
docs: update rule-based-matching.mdx (#13665) [ci skip] 2024-10-23 12:07:01 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs (#13466)
* fix typos

* prettier formatting

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments (#13302)
See #13293.
2024-02-11 19:46:43 +01:00
Raphael Mitsch
256468c414 Merge branch 'docs/llm_main' into chore/sync-master-with-llm_main
# Conflicts:
#	website/docs/api/large-language-models.mdx
2024-01-19 16:34:35 +01:00
Raphael Mitsch
0062c22c35
Updated docs w.r.t. infinite doc length changes (#13214)
* Updated docs w.r.t. infinite doc length.

* Fix typo.

* fix typo's

* Fix table formatting.

* Update formatting.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-01-05 14:20:58 +01:00
Raphael Mitsch
55ed2b4e82
Add documentation for EL task (#12988)
* Add documentation for EL task.

* Fix EL factory name.

* Add llm_entity_linker_mentio.

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Format.

* Fix link to KB data.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-12-04 15:23:28 +01:00
Daniël de Kok
da7ad97519
Update TextCatBOW to use the fixed SparseLinear layer (#13149)
* Update `TextCatBOW` to use the fixed `SparseLinear` layer

A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754

This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.

While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.

* Replace TexCatBOW `length_exponent` parameter by `length`

We now round up the length to the next power of two if it isn't
a power of two.

* Remove some tests for TextCatBOW.v2

* Fix missing import
2023-11-29 09:11:54 +01:00
Adriane Boyd
513bbd5fa3
Add preferred use of build for package CLI (#13109)
Build with `build` if available. Warn and fall back to previous
`setup.py`-based builds if `build` build fails.
2023-11-08 17:35:24 +01:00
Sofie Van Landeghem
48248c62b6
Clarify EL example in docs (#13071)
* add comment that pipeline is a custom one

* add link to NEL tutorial

* prettier

* revert prettier reformat

* revert prettier reformat (2)

* fix typo

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-10-31 21:58:29 +01:00
Ines Montani
b83f1e3724
Inline displaCy visualizations in docs (#13050) [ci skip] 2023-10-06 14:22:43 +02:00
Raphael Mitsch
be29216fe2
Merge pull request #13044 from explosion/docs/llm_main
Sync `master` with `docs/llm_main`
2023-10-05 16:10:19 +02:00
Raphael Mitsch
1162fcf099
Add Mistral mentions. (#13037) 2023-10-05 14:44:38 +02:00
Raphael Mitsch
862f8254e8
Add docs on Azure OpenAI support in spacy-llm (#13043)
* Add gpt-3.5-turbo-instruct to list of supported OpenAI models.

* Update `spacy-llm` task argument docs w.r.t. task refactoring (#12995)

* Update task arguments w.r.t. task refactoring in 0.5.0.

* Add disclaimer w.r.t. gated models/Llama 2.

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx

* Update docs w.r.t. PaLM support. (#13018)

* Add info on spacy.Azure.v1.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Format.
2023-10-05 13:18:27 +02:00
Raphael Mitsch
1dec138e61
Update docs w.r.t. PaLM support. (#13018) 2023-10-05 08:50:41 +02:00
Adriane Boyd
6e54360a3d
Remove pathy dependency, update docs for cloudpathlib in Weasel (#13035) 2023-10-05 08:50:22 +02:00
Adriane Boyd
160e61772e
Docs for v3.7.0 (#13029)
* Docs for v3.7.0

* Minor fixes

* Extend Weasel notes

* Minor edits

* Update version in README
2023-10-01 21:40:07 +02:00
Adriane Boyd
406794a081 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1 2023-09-28 15:09:06 +02:00
Madeesh Kannan
b4501db6f8
Update emoji library in rule-based matcher example (#13014) 2023-09-25 18:20:30 +02:00
Adriane Boyd
ff4215f1c7
Drop support for python 3.6 (#13009)
* Drop support for python 3.6

* Update docs
2023-09-25 14:48:38 +02:00
Sofie Van Landeghem
8f0d6b0a8c
Fix in BertTokenizer docs (#12955)
* fix BertWordPieceTokenizer constructor call

* fix

* Update website/docs/usage/linguistic-features.mdx

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-13 13:21:58 +02:00
Sofie Van Landeghem
def7013eec
Docs for spacy-llm 0.5.0 (#12968)
* Update incorrect example config. (#12893)

* spacy-llm docs cleanup (#12945)

* Shorten NER section

* fix template references

* simplify sections

* set temperature to 0.0 in examples

* condense model information

* fix parameters for REST models

* set temperature to 0.0

* spelling fix

* trigger preview

* fix quotes

* add small note on noop.v1

* move up example noop config

* set appropriate model example configs

* explain config

* fix

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* Docs for ner.v3 and spancat.v3 spacy-llm tasks (#12949)

* formatting

* update usage table with NER.v3

* fix typo in links

* v3 overview of parameters

* add spancat.v3

* add further v3 explanations

* remove TODO comment

* few more small fixes

* Add doc section on LLM + task factories (#12905)

* Add section on LLM + task factories.

* Apply suggestions from code review

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* add default config to openai models (#12961)

* Docs for spacy-llm 0.5.0 (#12967)

* simplify Python example

* simplify Python example

* Refer only to latest OpenAI model versions from usage doc

* Typo fix

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* clarify accuracy claim

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-09-08 10:25:14 +02:00
Magdalena Aniol
cc78847688
fix training.batch_size example (#12963) 2023-09-06 16:38:13 +02:00
Sofie Van Landeghem
6d1f6d9a23
Fix LLM usage example (#12950)
* fix usage example

* revert back to v2 to allow hot fix on main
2023-09-04 09:05:50 +02:00
Adriane Boyd
76a9f9c6c6
Docs: clarify abstract spacy.load examples (#12889) 2023-08-16 17:28:34 +02:00
Arman Mohammadi
07407e07ab
fix the regular expression matching on the full text (#12883)
There was a mistake in the regex pattern which caused not matching all the desired tokens. The problem was that when we use r string literal prefix to suppose a raw text, we should not use two backslashes to demonstrate a backslash.
2023-08-02 16:52:26 +02:00
Adriane Boyd
0fe43f40f1
Support registered vectors (#12492)
* Support registered vectors

* Format

* Auto-fill [nlp] on load from config and from bytes/disk

* Only auto-fill [nlp]

* Undo all changes to Language.from_disk

* Expand BaseVectors

These methods are needed in various places for training and vector
similarity.

* isort

* More linting

* Only fill [nlp.vectors]

* Update spacy/vocab.pyx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Revert changes to test related to auto-filling [nlp]

* Add vectors registry

* Rephrase error about vocab methods for vectors

* Switch to dummy implementation for BaseVectors.to_ops

* Add initial draft of docs

* Remove example from BaseVectors docs

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/docs/api/basevectors.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix type and lint bpemb example

* Update website/docs/api/basevectors.mdx

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-08-01 15:46:08 +02:00
Sofie Van Landeghem
c9e9dccf79
Add displaCy data structures to docs (2) (#12875)
* Add data structures to docs

* Adjusted descriptions for more consistency

* Add _optional_ flag to parameters

* Add tests and adjust optional title key in doc

* Add title to dep visualizations

* fix typo

---------

Co-authored-by: thomashacker <EdwardSchmuhl@web.de>
2023-07-31 10:47:57 +02:00
Victoria
e2b89012a2
Add spacy-llm docs to website (#12782)
* initial commit

* update for v0.4.0

* Apply suggestions from code review

* Fix formatting

* Apply suggestions from code review

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx

* update usage page

* Apply suggestions from review

* Apply suggestions from review

* fix links

* fix relative links

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply suggestions from review

* Add section on Llama 2. Format.

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-07-24 14:44:47 +02:00
Adriane Boyd
95075298f5
Update pex Makefile defaults (#12832)
* Update pex Makefile defaults

- switch to python 3.8
- only install spacy-lookups-data for extra packages

* Update website for pex defaults
2023-07-18 09:29:04 +02:00
Sofie Van Landeghem
ddffd09602
Trainable lemmatizer docs link (#12795)
* add an anchor to the trainable lemmatizer section

* add requirement for morphologizer,tagger to rule-based lemmatizer

* morphologizer only
2023-07-07 15:18:16 +02:00
Adriane Boyd
4e19ec7eb8
Docs for v3.6.0 (#12792)
* Docs for v3.6.0

* Add sl performance

* Add da trf note
2023-07-06 12:58:25 +02:00
Daniël de Kok
57a230c6e4
Remove section about parallel training with Ray (#12770)
The Ray integration is currently broken, having these docs around
suggest that this functionality is currently available.
2023-06-28 17:09:57 +02:00
Victoria
6930a6bf45
Add spaCy VSCode extension materials (#12592) 2023-05-19 14:38:53 +02:00
Basile Dura
2dd8825f09
docs: add comment on offset_x argument (#12630) 2023-05-15 11:42:47 +02:00
TAN Long
119f959218
docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:14:01 +02:00
Madeesh Kannan
6db20b354f
Docs: Fix rule-based matching example that expands named entities (#12495) 2023-04-06 11:45:58 +02:00
Edward
c95d320d28
Add more information to custom code docs (#12491)
* Add info to sections

* Update website/docs/usage/training.mdx

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-06 11:45:19 +02:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining (#12459)
* Adjust pretrain command

* chane naming and add finally block

* Add unit test

* Add unit test assertions

* Update spacy/training/pretrain.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* change finally block

* Add to docs

* Update website/docs/usage/embeddings-transformers.mdx

* Add flag to skip saving model-last

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
kadarakos
56aa0cc75f
Displacy doc fix (#12352)
* more details for color setting

* more details for color setting

* prettier
2023-03-01 15:38:23 +01:00
Adriane Boyd
33864f1d07
Add new tags in docs for #12334 (#12348) 2023-03-01 10:46:13 +01:00
TAN Long
071667376a
Add new REL_OPs: >+, >-, <+, and <- (#12334)
* Add immediate left/right child/parent dependency relations

* Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`.

---------

Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-02-28 14:36:33 +01:00
Adriane Boyd
4539fbae17
Revert "Fix FUZZY operator definition (#12318)" (#12336)
This reverts commit daedc45d05.

The default length depends on the length of the pattern string and was
correct for this example.
2023-02-27 09:48:36 +01:00
andyjessen
daedc45d05
Fix FUZZY operator definition (#12318)
* Fix FUZZY operator definition

The default length of the FUZZY operator is 2 and not 3.

* adjust edit distance in matcher usage docs too

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2023-02-23 09:37:40 +01:00
Raphael Mitsch
2d4fb94ba0
Fix wrong file name in docs for rule-based matcher. (#12262) 2023-02-09 12:58:14 +01:00
Sofie Van Landeghem
bd739e67d6
explain KB change and how to remedy (#12189) 2023-01-27 15:13:20 +01:00
Marcus Blättermann
031f6c7b60
WEB-27 Add alt tags to images (#12166)
* Update spaCy badge `alt` text

* Add `next/image` component to Universe

* Add missing `alt`texts
2023-01-24 13:56:14 +01:00
Edward
e9048fd4a1
Add how to load probability tables to existing models to spaCy docs (#12051)
* add section about adding tables to models

* change to lexeme_norm

* Change syntax

* change to _prob

* Update website/docs/usage/saving-loading.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-01-24 10:01:22 +01:00
Sofie Van Landeghem
0f5d8a27f2
3.5 usage page (#12057)
* skeleton

* Fill in non-CLI details from release notes draft

* Add TODO for fuzzy matching

* Website updates for v3-5 draft

* Fill in usage examples

* Add fuzzy matching to intro

* Fix fuzzy examples

* Shell example formatting

* Fix typo

* Format

* Remove trailing periods in internal list

* Update

* Fix spacing for nested lists

* Update InMemoryLookupKB link

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2023-01-19 16:13:04 +01:00
Adriane Boyd
3b8918e166
API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128)
* API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar

* adjust to mdx

* linkout to InMemoryLookupKB at first occurrence in kb.mdx

* fix links to docs

* revert Azure trigger setting (I'll make a separate PR)

Co-authored-by: svlandeg <svlandeg@github.com>
2023-01-19 13:29:17 +01:00