Matthew Honnibal
3ecec1324c
Usage page on memory management, explaining memory zones and doc_cleaner ( #13643 ) [ci skip]
...
Co-authored-by: Ines Montani <ines@ines.io>
2024-10-23 12:42:54 +02:00
Ikko Eltociear Ashimine
15fbf5ef36
docs: update rule-based-matching.mdx ( #13665 ) [ci skip]
2024-10-23 12:07:01 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs ( #13466 )
...
* fix typos
* prettier formatting
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments ( #13302 )
...
See #13293 .
2024-02-11 19:46:43 +01:00
Raphael Mitsch
256468c414
Merge branch 'docs/llm_main' into chore/sync-master-with-llm_main
...
# Conflicts:
# website/docs/api/large-language-models.mdx
2024-01-19 16:34:35 +01:00
Raphael Mitsch
0062c22c35
Updated docs w.r.t. infinite doc length changes ( #13214 )
...
* Updated docs w.r.t. infinite doc length.
* Fix typo.
* fix typo's
* Fix table formatting.
* Update formatting.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-01-05 14:20:58 +01:00
Raphael Mitsch
55ed2b4e82
Add documentation for EL task ( #12988 )
...
* Add documentation for EL task.
* Fix EL factory name.
* Add llm_entity_linker_mentio.
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Update EL task docs.
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Incorporate feedback.
* Format.
* Fix link to KB data.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-12-04 15:23:28 +01:00
Daniël de Kok
da7ad97519
Update TextCatBOW
to use the fixed SparseLinear
layer ( #13149 )
...
* Update `TextCatBOW` to use the fixed `SparseLinear` layer
A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754
This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.
While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.
* Replace TexCatBOW `length_exponent` parameter by `length`
We now round up the length to the next power of two if it isn't
a power of two.
* Remove some tests for TextCatBOW.v2
* Fix missing import
2023-11-29 09:11:54 +01:00
Adriane Boyd
513bbd5fa3
Add preferred use of build for package CLI ( #13109 )
...
Build with `build` if available. Warn and fall back to previous
`setup.py`-based builds if `build` build fails.
2023-11-08 17:35:24 +01:00
Sofie Van Landeghem
48248c62b6
Clarify EL example in docs ( #13071 )
...
* add comment that pipeline is a custom one
* add link to NEL tutorial
* prettier
* revert prettier reformat
* revert prettier reformat (2)
* fix typo
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-10-31 21:58:29 +01:00
Ines Montani
b83f1e3724
Inline displaCy visualizations in docs ( #13050 ) [ci skip]
2023-10-06 14:22:43 +02:00
Raphael Mitsch
be29216fe2
Merge pull request #13044 from explosion/docs/llm_main
...
Sync `master` with `docs/llm_main`
2023-10-05 16:10:19 +02:00
Raphael Mitsch
1162fcf099
Add Mistral mentions. ( #13037 )
2023-10-05 14:44:38 +02:00
Raphael Mitsch
862f8254e8
Add docs on Azure OpenAI support in spacy-llm
( #13043 )
...
* Add gpt-3.5-turbo-instruct to list of supported OpenAI models.
* Update `spacy-llm` task argument docs w.r.t. task refactoring (#12995 )
* Update task arguments w.r.t. task refactoring in 0.5.0.
* Add disclaimer w.r.t. gated models/Llama 2.
* Update website/docs/api/large-language-models.mdx
* Update website/docs/api/large-language-models.mdx
* Update docs w.r.t. PaLM support. (#13018 )
* Add info on spacy.Azure.v1.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Attempt to fix netlify check fails.
* Format.
2023-10-05 13:18:27 +02:00
Raphael Mitsch
1dec138e61
Update docs w.r.t. PaLM support. ( #13018 )
2023-10-05 08:50:41 +02:00
Adriane Boyd
6e54360a3d
Remove pathy dependency, update docs for cloudpathlib in Weasel ( #13035 )
2023-10-05 08:50:22 +02:00
Adriane Boyd
160e61772e
Docs for v3.7.0 ( #13029 )
...
* Docs for v3.7.0
* Minor fixes
* Extend Weasel notes
* Minor edits
* Update version in README
2023-10-01 21:40:07 +02:00
Adriane Boyd
406794a081
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1
2023-09-28 15:09:06 +02:00
Madeesh Kannan
b4501db6f8
Update emoji library in rule-based matcher example ( #13014 )
2023-09-25 18:20:30 +02:00
Adriane Boyd
ff4215f1c7
Drop support for python 3.6 ( #13009 )
...
* Drop support for python 3.6
* Update docs
2023-09-25 14:48:38 +02:00
Sofie Van Landeghem
8f0d6b0a8c
Fix in BertTokenizer docs ( #12955 )
...
* fix BertWordPieceTokenizer constructor call
* fix
* Update website/docs/usage/linguistic-features.mdx
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-13 13:21:58 +02:00
Sofie Van Landeghem
def7013eec
Docs for spacy-llm 0.5.0 ( #12968 )
...
* Update incorrect example config. (#12893 )
* spacy-llm docs cleanup (#12945 )
* Shorten NER section
* fix template references
* simplify sections
* set temperature to 0.0 in examples
* condense model information
* fix parameters for REST models
* set temperature to 0.0
* spelling fix
* trigger preview
* fix quotes
* add small note on noop.v1
* move up example noop config
* set appropriate model example configs
* explain config
* fix
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* Docs for ner.v3 and spancat.v3 spacy-llm tasks (#12949 )
* formatting
* update usage table with NER.v3
* fix typo in links
* v3 overview of parameters
* add spancat.v3
* add further v3 explanations
* remove TODO comment
* few more small fixes
* Add doc section on LLM + task factories (#12905 )
* Add section on LLM + task factories.
* Apply suggestions from code review
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* add default config to openai models (#12961 )
* Docs for spacy-llm 0.5.0 (#12967 )
* simplify Python example
* simplify Python example
* Refer only to latest OpenAI model versions from usage doc
* Typo fix
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* clarify accuracy claim
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-09-08 10:25:14 +02:00
Magdalena Aniol
cc78847688
fix training.batch_size example ( #12963 )
2023-09-06 16:38:13 +02:00
Sofie Van Landeghem
6d1f6d9a23
Fix LLM usage example ( #12950 )
...
* fix usage example
* revert back to v2 to allow hot fix on main
2023-09-04 09:05:50 +02:00
Adriane Boyd
76a9f9c6c6
Docs: clarify abstract spacy.load examples ( #12889 )
2023-08-16 17:28:34 +02:00
Arman Mohammadi
07407e07ab
fix the regular expression matching on the full text ( #12883 )
...
There was a mistake in the regex pattern which caused not matching all the desired tokens. The problem was that when we use r string literal prefix to suppose a raw text, we should not use two backslashes to demonstrate a backslash.
2023-08-02 16:52:26 +02:00
Adriane Boyd
0fe43f40f1
Support registered vectors ( #12492 )
...
* Support registered vectors
* Format
* Auto-fill [nlp] on load from config and from bytes/disk
* Only auto-fill [nlp]
* Undo all changes to Language.from_disk
* Expand BaseVectors
These methods are needed in various places for training and vector
similarity.
* isort
* More linting
* Only fill [nlp.vectors]
* Update spacy/vocab.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Revert changes to test related to auto-filling [nlp]
* Add vectors registry
* Rephrase error about vocab methods for vectors
* Switch to dummy implementation for BaseVectors.to_ops
* Add initial draft of docs
* Remove example from BaseVectors docs
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update website/docs/api/basevectors.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix type and lint bpemb example
* Update website/docs/api/basevectors.mdx
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-08-01 15:46:08 +02:00
Sofie Van Landeghem
c9e9dccf79
Add displaCy data structures to docs (2) ( #12875 )
...
* Add data structures to docs
* Adjusted descriptions for more consistency
* Add _optional_ flag to parameters
* Add tests and adjust optional title key in doc
* Add title to dep visualizations
* fix typo
---------
Co-authored-by: thomashacker <EdwardSchmuhl@web.de>
2023-07-31 10:47:57 +02:00
Victoria
e2b89012a2
Add spacy-llm docs to website ( #12782 )
...
* initial commit
* update for v0.4.0
* Apply suggestions from code review
* Fix formatting
* Apply suggestions from code review
* Update website/docs/api/large-language-models.mdx
* Update website/docs/api/large-language-models.mdx
* update usage page
* Apply suggestions from review
* Apply suggestions from review
* fix links
* fix relative links
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Apply suggestions from review
* Add section on Llama 2. Format.
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-07-24 14:44:47 +02:00
Adriane Boyd
95075298f5
Update pex Makefile defaults ( #12832 )
...
* Update pex Makefile defaults
- switch to python 3.8
- only install spacy-lookups-data for extra packages
* Update website for pex defaults
2023-07-18 09:29:04 +02:00
Sofie Van Landeghem
ddffd09602
Trainable lemmatizer docs link ( #12795 )
...
* add an anchor to the trainable lemmatizer section
* add requirement for morphologizer,tagger to rule-based lemmatizer
* morphologizer only
2023-07-07 15:18:16 +02:00
Adriane Boyd
4e19ec7eb8
Docs for v3.6.0 ( #12792 )
...
* Docs for v3.6.0
* Add sl performance
* Add da trf note
2023-07-06 12:58:25 +02:00
Daniël de Kok
57a230c6e4
Remove section about parallel training with Ray ( #12770 )
...
The Ray integration is currently broken, having these docs around
suggest that this functionality is currently available.
2023-06-28 17:09:57 +02:00
Victoria
6930a6bf45
Add spaCy VSCode extension materials ( #12592 )
2023-05-19 14:38:53 +02:00
Basile Dura
2dd8825f09
docs: add comment on offset_x
argument ( #12630 )
2023-05-15 11:42:47 +02:00
TAN Long
119f959218
docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 ( #12531 )
...
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-04-17 13:14:01 +02:00
Madeesh Kannan
6db20b354f
Docs
: Fix rule-based matching example that expands named entities (#12495 )
2023-04-06 11:45:58 +02:00
Edward
c95d320d28
Add more information to custom code docs ( #12491 )
...
* Add info to sections
* Update website/docs/usage/training.mdx
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-06 11:45:19 +02:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining ( #12459 )
...
* Adjust pretrain command
* chane naming and add finally block
* Add unit test
* Add unit test assertions
* Update spacy/training/pretrain.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* change finally block
* Add to docs
* Update website/docs/usage/embeddings-transformers.mdx
* Add flag to skip saving model-last
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
kadarakos
56aa0cc75f
Displacy doc fix ( #12352 )
...
* more details for color setting
* more details for color setting
* prettier
2023-03-01 15:38:23 +01:00
Adriane Boyd
33864f1d07
Add new tags in docs for #12334 ( #12348 )
2023-03-01 10:46:13 +01:00
TAN Long
071667376a
Add new REL_OPs: >+
, >-
, <+
, and <-
( #12334 )
...
* Add immediate left/right child/parent dependency relations
* Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`.
---------
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-02-28 14:36:33 +01:00
Adriane Boyd
4539fbae17
Revert "Fix FUZZY operator definition ( #12318 )" ( #12336 )
...
This reverts commit daedc45d05
.
The default length depends on the length of the pattern string and was
correct for this example.
2023-02-27 09:48:36 +01:00
andyjessen
daedc45d05
Fix FUZZY operator definition ( #12318 )
...
* Fix FUZZY operator definition
The default length of the FUZZY operator is 2 and not 3.
* adjust edit distance in matcher usage docs too
---------
Co-authored-by: svlandeg <svlandeg@github.com>
2023-02-23 09:37:40 +01:00
Raphael Mitsch
2d4fb94ba0
Fix wrong file name in docs for rule-based matcher. ( #12262 )
2023-02-09 12:58:14 +01:00
Sofie Van Landeghem
bd739e67d6
explain KB change and how to remedy ( #12189 )
2023-01-27 15:13:20 +01:00
Marcus Blättermann
031f6c7b60
WEB-27 Add alt
tags to images ( #12166 )
...
* Update spaCy badge `alt` text
* Add `next/image` component to Universe
* Add missing `alt`texts
2023-01-24 13:56:14 +01:00
Edward
e9048fd4a1
Add how to load probability tables to existing models to spaCy docs ( #12051 )
...
* add section about adding tables to models
* change to lexeme_norm
* Change syntax
* change to _prob
* Update website/docs/usage/saving-loading.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-01-24 10:01:22 +01:00
Sofie Van Landeghem
0f5d8a27f2
3.5 usage page ( #12057 )
...
* skeleton
* Fill in non-CLI details from release notes draft
* Add TODO for fuzzy matching
* Website updates for v3-5 draft
* Fill in usage examples
* Add fuzzy matching to intro
* Fix fuzzy examples
* Shell example formatting
* Fix typo
* Format
* Remove trailing periods in internal list
* Update
* Fix spacing for nested lists
* Update InMemoryLookupKB link
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Ines Montani <ines@ines.io>
2023-01-19 16:13:04 +01:00
Adriane Boyd
3b8918e166
API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar ( #12128 )
...
* API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar
* adjust to mdx
* linkout to InMemoryLookupKB at first occurrence in kb.mdx
* fix links to docs
* revert Azure trigger setting (I'll make a separate PR)
Co-authored-by: svlandeg <svlandeg@github.com>
2023-01-19 13:29:17 +01:00