Commit Graph

16081 Commits

Author SHA1 Message Date
Sofie Van Landeghem
56fc3bc0f3
Type documentation fixes for Doc (#13187)
* correct char_span output type - can be None

* unify type of exclude parameter

* black

* further fixes to from_dict and to_dict

* formatting
2023-12-18 09:00:47 +01:00
Ines Montani
7df328fbfe
Update README.md [ci skip] 2023-12-12 10:19:57 +01:00
Ines Montani
8cfccdd2f8
Update links [ci skip] 2023-12-11 15:51:43 +01:00
Ines Montani
f78b91c03b
Update links [ci skip] 2023-12-11 15:51:01 +01:00
Adriane Boyd
e467573550
Docs: update trf_data examples and pipeline design info (#13164) 2023-12-04 15:15:54 +01:00
Daniël de Kok
da7ad97519
Update TextCatBOW to use the fixed SparseLinear layer (#13149)
* Update `TextCatBOW` to use the fixed `SparseLinear` layer

A while ago, we fixed the `SparseLinear` layer to use all available
parameters: https://github.com/explosion/thinc/pull/754

This change updates `TextCatBOW` to `v3` which uses the new
`SparseLinear_v2` layer. This results in a sizeable improvement on a
text categorization task that was tested.

While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent`
option to make it possible to change the hidden size. Ideally, we'd just
have an option called `length`. But the way that `TextCatBOW` uses
hashes results in a non-uniform distribution of parameters when the
length is not a power of two.

* Replace TexCatBOW `length_exponent` parameter by `length`

We now round up the length to the next power of two if it isn't
a power of two.

* Remove some tests for TextCatBOW.v2

* Fix missing import
2023-11-29 09:11:54 +01:00
Ines Montani
bf7c2ea99a
Add merch link [ci skip] 2023-11-22 12:55:00 +01:00
Ines Montani
8f69e56a5a Add swag [ci skip] 2023-11-20 14:42:01 +01:00
Lise
b6e022381d
Feature/nn and fo language extensions (#13116)
* add language extensions for norwegian nynorsk and faroese

* update docstring for nn/examples.py

* use relative imports

* add fo and nn tokenizers to pytest fixtures

* add unittests for fo and nn and fix bug in nn

* remove module docstring from fo/__init__.py

* add comments about example sentences' origin

* add license information to faroese data credit

* format unittests using black

* add __init__ files to test/lang/nn and tests/lang/fo

* fix import order and use relative imports in fo/__nit__.py and nn/__init__.py

* Make the tests a bit more compact

* Add fo and nn to website languages

* Add note about jul.

* Add "jul." as exception

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-11-20 07:49:59 +01:00
ajbond
9f2ce6bb00
Add Redfield NLP Nodes to the Spacy Universe (#13133) 2023-11-17 09:48:02 +01:00
Madeesh Kannan
bd2c17e206
Warn about reloading dependencies after downloading models (#13081)
* Update the "Missing factory" error message

This accounts for model installations that took place during the current Python session.

* Add a note about Jupyter notebooks

* Move error to `spacy.cli.download`
Add extra message for Jupyter sessions

* Add additional note for interactive sessions

* Remove note about `spacy-transformers` from error message

* `isort`

* Improve checks for colab (also helps displacy)

* Update warning messages

* Improve flow for multiple checks

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-11-10 08:05:07 +01:00
Adriane Boyd
513bbd5fa3
Add preferred use of build for package CLI (#13109)
Build with `build` if available. Warn and fall back to previous
`setup.py`-based builds if `build` build fails.
2023-11-08 17:35:24 +01:00
Ridge Kimani
2b8da84717
feat: add extra lexical attributes (#13106)
Co-authored-by: Ridge Kimani <ridgekimani@gmail.com>
2023-11-08 17:29:11 +01:00
Adriane Boyd
0c25725359
Update Tokenizer.explain for special cases with whitespace (#13086)
* Update Tokenizer.explain for special cases with whitespace

Update `Tokenizer.explain` to skip special case matches if the exact
text has not been matched due to intervening whitespace.

Enable fuzzy `Tokenizer.explain` tests with additional whitespace
normalization.

* Add unit test for special cases with whitespace, xfail fuzzy tests again
2023-11-06 17:29:59 +01:00
Adriane Boyd
ff9ddb6a07
Unskip python 3.12 remote tests (#13110) 2023-11-06 11:59:45 +01:00
Adriane Boyd
c096c5c0c9
Update for numpy 2.0 deprecations (#13103)
- Replace `np.trapz` with vendored `trapezoid` from scipy
- Replace `np.float_` with `np.float64`
2023-11-06 08:47:53 +01:00
Adriane Boyd
92f1d0a195
CI: Switch to stable python 3.12 and limit 3.11 runs (#13104) 2023-11-03 15:46:03 +01:00
Raphael Mitsch
c4e2daf6ef
Fix displacy span stacking (#13068)
* Fix displacy span stacking.

* Format. Remove counter.

* Remove test files.

* Add unit test. Refactor to allow for unit test.

* Fix off-by-one error in tests.
2023-11-02 12:02:18 +01:00
Sofie Van Landeghem
a804b83a4b
Update llm docs to clarify task-specific factories (#13082)
* fix typo

* add examples to specify custom model for task-specific factory
2023-10-31 22:07:07 +01:00
Sofie Van Landeghem
48248c62b6
Clarify EL example in docs (#13071)
* add comment that pipeline is a custom one

* add link to NEL tutorial

* prettier

* revert prettier reformat

* revert prettier reformat (2)

* fix typo

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-10-31 21:58:29 +01:00
Raphael Mitsch
0c15876502
Fix spancat typo. (#13095) 2023-10-31 13:45:10 +01:00
Raphael Mitsch
9deaac9786
Add note in docs on score_weight config if using a non-default spans_key for SpanCat (#13093)
* Add note on score_weight if using a non-default span_key for SpanCat.

* Fix formatting.

* Fix formatting.

* Fix typo.

* Use warning infobox.

* Fix infobox formatting.
2023-10-30 17:02:08 +01:00
Sofie Van Landeghem
d717123819
Update LICENSE (#13078) 2023-10-23 11:59:18 +02:00
Adriane Boyd
a89eae9283
Set version to v3.7.2 (#13066) 2023-10-16 15:10:55 +02:00
Sofie Van Landeghem
699dd8b3b7
Update __all__ fields (#13063)
* update all for pipeline.init

* add all in training.init

* add all in kb.init

* alphabetically
2023-10-16 10:17:47 +02:00
Adriane Boyd
ea1befa8ff
Support Any comparisons for Token and Span (#13058)
* Support Any comparisons for Token and Span

* Preserve previous behavior for None
2023-10-12 11:53:33 +02:00
Raphael Mitsch
d72029d9c8
Add binary examples for Textcat task in spacy-llm (#13051)
* Add examples for binary classification.

* Fix example.

* Remove binary textcat example. Format.

* Rephrase.
2023-10-11 12:23:38 +02:00
Adriane Boyd
77c568e524
Restore spacy.cli.project API (#13053)
* Restore spacy.cli.project API

* Fix typing errors, add simple import test
2023-10-10 15:35:25 +02:00
Ines Montani
65e7bd54f5 Update usage sidebar and nav alert [ci skip] 2023-10-06 14:36:37 +02:00
Ines Montani
b83f1e3724
Inline displaCy visualizations in docs (#13050) [ci skip] 2023-10-06 14:22:43 +02:00
Raphael Mitsch
be29216fe2
Merge pull request #13044 from explosion/docs/llm_main
Sync `master` with `docs/llm_main`
2023-10-05 16:10:19 +02:00
Raphael Mitsch
1162fcf099
Add Mistral mentions. (#13037) 2023-10-05 14:44:38 +02:00
Raphael Mitsch
862f8254e8
Add docs on Azure OpenAI support in spacy-llm (#13043)
* Add gpt-3.5-turbo-instruct to list of supported OpenAI models.

* Update `spacy-llm` task argument docs w.r.t. task refactoring (#12995)

* Update task arguments w.r.t. task refactoring in 0.5.0.

* Add disclaimer w.r.t. gated models/Llama 2.

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx

* Update docs w.r.t. PaLM support. (#13018)

* Add info on spacy.Azure.v1.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Attempt to fix netlify check fails.

* Format.
2023-10-05 13:18:27 +02:00
Raphael Mitsch
1dec138e61
Update docs w.r.t. PaLM support. (#13018) 2023-10-05 08:50:41 +02:00
Adriane Boyd
6e54360a3d
Remove pathy dependency, update docs for cloudpathlib in Weasel (#13035) 2023-10-05 08:50:22 +02:00
Raphael Mitsch
734826db79
Update spacy-llm task argument docs w.r.t. task refactoring (#12995)
* Update task arguments w.r.t. task refactoring in 0.5.0.

* Add disclaimer w.r.t. gated models/Llama 2.

* Update website/docs/api/large-language-models.mdx

* Update website/docs/api/large-language-models.mdx
2023-10-05 08:45:25 +02:00
Raphael Mitsch
829613b959
Merge pull request #12999 from rmitsch/docs/gpt-3.5-turbo-instruct
Add `gpt-3.5-turbo-instruct` to list of supported OpenAI models
2023-10-05 08:41:07 +02:00
Adriane Boyd
9d036607f1
Set version to v3.7.1 (#13042) 2023-10-04 18:13:12 +02:00
Adriane Boyd
aec59c0088
Merge pull request #13040 from adrianeboyd/revert/12962-spacy-info
Revert "Load the cli module lazily for spacy.info (#12962)"
2023-10-04 17:20:32 +02:00
Adriane Boyd
6d0185f7fb Revert "Load the cli module lazily for spacy.info (#12962)"
This reverts commit beda27a91e.
2023-10-04 12:33:33 +02:00
Adriane Boyd
92ce32aa3f
Update binder version to v3.7 (#13034) 2023-10-02 12:53:46 +02:00
Adriane Boyd
160e61772e
Docs for v3.7.0 (#13029)
* Docs for v3.7.0

* Minor fixes

* Extend Weasel notes

* Minor edits

* Update version in README
2023-10-01 21:40:07 +02:00
Adriane Boyd
0fed2d9e28
Merge pull request #12899 from adrianeboyd/chore/v3.7.0
Reenable model tests for v3.7.0
2023-10-01 21:38:17 +02:00
Adriane Boyd
1b043dde3f Revert "disable tests until 3.7 models are available"
This reverts commit 991bcc111e.
2023-10-01 18:48:31 +02:00
Adriane Boyd
4ec41e98f6
Merge pull request #12979 from adrianeboyd/feature/cython-profile-312
Redesigned cython profiling and other minor updates for python 3.12
2023-09-29 08:23:38 +02:00
Matthew Hoffman
483d4a5bc0
Allow spacy-transformers v1.3.x in transformers extra (#13025)
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-09-29 08:22:56 +02:00
Adriane Boyd
6b4f774418
Set version to v3.7.0 (#13028) 2023-09-28 21:27:42 +02:00
Adriane Boyd
78504c25a5 CI: Add python 3.12.0rc2 2023-09-28 17:12:42 +02:00
Adriane Boyd
467c82439e Always use tqdm with disable=None
`tqdm` can cause deadlocks in the test suite if enabled.
2023-09-28 17:12:42 +02:00
Adriane Boyd
b4990395f9 Update mypy requirements 2023-09-28 17:12:42 +02:00