Commit Graph

3195 Commits

Author SHA1 Message Date
Ines Montani
3e30b5bef6 Add spacy-layout [ci skip] 2024-11-19 10:43:40 +01:00
Matthew Honnibal
3ecec1324c
Usage page on memory management, explaining memory zones and doc_cleaner (#13643) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-10-23 12:42:54 +02:00
Ikko Eltociear Ashimine
15fbf5ef36
docs: update rule-based-matching.mdx (#13665) [ci skip] 2024-10-23 12:07:01 +02:00
Sergei Pashakhin
1ee9a19059
Fix typo (#13657) [ci skip] 2024-10-23 12:06:36 +02:00
thjbdvlt
0d7e57fc3e
universe-pipeline-solipCysme-french (#13627) [ci skip] 2024-10-11 11:26:15 +02:00
Ines Montani
ae5c3e078d Fix universe.json [ci skip] 2024-10-11 11:24:42 +02:00
aravind-mc
44d1906453
Update universe.json to add my spaCy online course (#13632) [ci skip] 2024-10-11 11:21:57 +02:00
Ines Montani
10a6f508ab Fix landing banner links [ci skip] 2024-10-11 11:19:10 +02:00
William Mattingly
30f1f33e78
Added Date spaCy to universe (#13415) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:29:03 +02:00
William Mattingly
f1a5ff9dba
added spacy whisper to universe (#13418) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:28:00 +02:00
William Mattingly
c80dacd046
added spacy annoy to universe (#13416) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:26:21 +02:00
William Mattingly
7fbbb2002a
updated universe for number spacy (#13424) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:25:23 +02:00
William Mattingly
89c1774d43
added bagpipes-spacy to universe (#13425) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:24:06 +02:00
thjbdvlt
081e4e385d
universe-project-presque (#13515) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:21:41 +02:00
thjbdvlt
0190e669c5
universe-package-quelquhui (#13514) [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:17:33 +02:00
Oren Halvani
54dc4ee8fb
Added: Constituent-Treelib to: universe.json (#13432) [ci skip]
Co-authored-by: Halvani <>
2024-09-10 14:13:36 +02:00
William Mattingly
5a7ad5572c
added gliner-spacy to universe (#13417) [ci skip]
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Ines Montani <ines@ines.io>
2024-09-10 14:12:52 +02:00
ykyogoku
608f65ce40
add Tibetan (#13510) 2024-09-09 11:18:03 +02:00
Muzaffer Cikay
acbf2a428f
Add Kurdish Kurmanji language (#13561)
* Add Kurdish Kurmanji language

* Add lex_attrs
2024-09-09 11:15:40 +02:00
Ines Montani
8cda27aefa Add case study [ci skip] 2024-06-26 09:41:23 +02:00
Sofie Van Landeghem
c195ca4f9c
fix docs for MorphAnalysis.__contains__ (#13433) 2024-05-02 16:46:41 +02:00
Alex Strick van Linschoten
045cd43c3f
Fix typos in docs (#13466)
* fix typos

* prettier formatting

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2024-04-29 11:10:17 +02:00
Sofie Van Landeghem
2e2334632b
Fix use_gold_ents behaviour for EntityLinker (#13400)
* fix type annotation in docs

* only restore entities after loss calculation

* restore entities of sample in initialization

* rename overfitting function

* fix EL scorer

* Relax test

* fix formatting

* Update spacy/pipeline/entity_linker.py

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* rename to _ensure_ents

* further rename

* allow for scorer to be None

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2024-04-16 12:00:22 +02:00
Yaseen
21aea59001
Update code.module.sass to make code title sticky (#13379) 2024-03-26 12:15:25 +01:00
Ines Montani
1252370f69 Move DocSearch key to env var [ci skip] 2024-03-25 10:17:57 +01:00
Daniël de Kok
14bd9d89a3
Update example that shows model in requirments (#13302)
See #13293.
2024-02-11 19:46:43 +01:00
Eliana Vornov
00e938a7c3
add custom code support to CLI speed benchmark (#13247)
* add custom code support to CLI speed benchmark

* sort imports

* better copying for warmup docs
2024-01-26 13:29:22 +01:00
Sofie Van Landeghem
68b85ea950
Clarify data_path loading for apply CLI command (#13272)
* attempt to clarify additional annotations on .spacy file

* suggestion by Daniël

* pipeline instead of pipe
2024-01-26 12:10:05 +01:00
Sofie Van Landeghem
7496e03a2c
Clarify vocab docs (#13273)
* add line to ensure that apple is in fact in the vocab

* add that the vocab may be empty
2024-01-26 10:58:48 +01:00
Sofie Van Landeghem
a493981163
fix typo (#13254) 2024-01-24 09:29:57 +01:00
Raphael Mitsch
575c405ae3 Fix LLM docs on task factories. 2024-01-19 16:48:54 +01:00
Raphael Mitsch
256468c414 Merge branch 'docs/llm_main' into chore/sync-master-with-llm_main
# Conflicts:
#	website/docs/api/large-language-models.mdx
2024-01-19 16:34:35 +01:00
Raphael Mitsch
91c24c0285
Merge pull request #13251 from explosion/docs/llm_develop
Sync `docs/llm_main` with `docs/llm_develop`
2024-01-19 12:56:38 +01:00
Raphael Mitsch
0062c22c35
Updated docs w.r.t. infinite doc length changes (#13214)
* Updated docs w.r.t. infinite doc length.

* Fix typo.

* fix typo's

* Fix table formatting.

* Update formatting.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2024-01-05 14:20:58 +01:00
Daniël de Kok
e2a3952de5
Add spacy.TextCatParametricAttention.v1 (#13201)
* Add spacy.TextCatParametricAttention.v1

This layer provides is a simplification of the ensemble classifier that
only uses paramteric attention. We have found empirically that with a
sufficient amount of training data, using the ensemble classifier with
BoW does not provide significant improvement in classifier accuracy.
However, plugging in a BoW classifier does reduce GPU training and
inference performance substantially, since it uses a GPU-only kernel.

* Fix merge fallout
2024-01-02 10:03:06 +01:00
Daniël de Kok
7ebba86402
Add TextCatReduce.v1 (#13181)
* Add TextCatReduce.v1

This is a textcat classifier that pools the vectors generated by a
tok2vec implementation and then applies a classifier to the pooled
representation. Three reductions are supported for pooling: first, max,
and mean. When multiple reductions are enabled, the reductions are
concatenated before providing them to the classification layer.

This model is a generalization of the TextCatCNN model, which only
supports mean reductions and is a bit of a misnomer, because it can also
be used with transformers. This change also reimplements TextCatCNN.v2
using the new TextCatReduce.v1 layer.

* Doc fixes

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence

* Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy

* Add back a test for TextCatCNN.v2

* Replace TextCatCNN in pipe configurations and templates

* Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor

* Add last reduction (`use_reduce_last`)

* Remove non-working TextCatCNN Netlify redirect

* Revert layer changes for the quickstart

* Revert one more quickstart change

* Remove unused import

* Fix docstring

* Fix setting name in error message

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-12-21 11:00:06 +01:00
Raphael Mitsch
d56ee65ddf
Document spacy-llm's TranslationTask (#13183)
* Describe translation task.

* Fix references to examples and template.

* Format.
2023-12-11 17:41:04 +01:00
Raphael Mitsch
e79a9c5acd
Document spacy-llm's RawTask (#13180)
* Add section on RawTask.

* Fix API docs.

* Update website/docs/api/large-language-models.mdx

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-12-11 17:14:12 +01:00
Raphael Mitsch
9fcd2bfa08
Add info on endpoint arg. (#13169) 2023-12-05 12:46:29 +01:00
Raphael Mitsch
a25a3b996b
Merge pull request #13173 from explosion/docs/llm_main
Sync `llm_develop` with `llm_main`
2023-12-04 16:46:21 +01:00
Raphael Mitsch
55ed2b4e82
Add documentation for EL task (#12988)
* Add documentation for EL task.

* Fix EL factory name.

* Add llm_entity_linker_mentio.

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Update EL task docs.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Format.

* Fix link to KB data.

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2023-12-04 15:23:28 +01:00
Adriane Boyd
e467573550
Docs: update trf_data examples and pipeline design info (#13164) 2023-12-04 15:15:54 +01:00
Raphael Mitsch
0e43fca036
Add Claude-2.1 mention. (#13167) 2023-12-01 16:48:35 +01:00
Daniël de Kok
da7ad97519
Update TextCatBOW to use the fixed SparseLinear layer (#13149)
* Update `TextCatBOW` to use the fixed `SparseLinear` layer

A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754

This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.

While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.

* Replace TexCatBOW `length_exponent` parameter by `length`

We now round up the length to the next power of two if it isn't
a power of two.

* Remove some tests for TextCatBOW.v2

* Fix missing import
2023-11-29 09:11:54 +01:00
Ines Montani
8f69e56a5a Add swag [ci skip] 2023-11-20 14:42:01 +01:00
Lise
b6e022381d
Feature/nn and fo language extensions (#13116)
* add language extensions for norwegian nynorsk and faroese

* update docstring for nn/examples.py

* use relative imports

* add fo and nn tokenizers to pytest fixtures

* add unittests for fo and nn and fix bug in nn

* remove module docstring from fo/__init__.py

* add comments about example sentences' origin

* add license information to faroese data credit

* format unittests using black

* add __init__ files to test/lang/nn and tests/lang/fo

* fix import order and use relative imports in fo/__nit__.py and nn/__init__.py

* Make the tests a bit more compact

* Add fo and nn to website languages

* Add note about jul.

* Add "jul." as exception

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-11-20 07:49:59 +01:00
ajbond
9f2ce6bb00
Add Redfield NLP Nodes to the Spacy Universe (#13133) 2023-11-17 09:48:02 +01:00
Raphael Mitsch
b2e831d966
LLM docs: OpenAI model update (#13119)
* Update supported OpenAI models.

* Update with new GPT-3.5 and GPT-4 versions.

* Add links to OpenAI model docs.
2023-11-08 17:55:16 +01:00
Adriane Boyd
513bbd5fa3
Add preferred use of build for package CLI (#13109)
Build with `build` if available. Warn and fall back to previous
`setup.py`-based builds if `build` build fails.
2023-11-08 17:35:24 +01:00
Sofie Van Landeghem
a804b83a4b
Update llm docs to clarify task-specific factories (#13082)
* fix typo

* add examples to specify custom model for task-specific factory
2023-10-31 22:07:07 +01:00