spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-02 18:06:46 +03:00

Author	SHA1	Message	Date
Daniël de Kok	81beaea70e	Merge remote-tracking branch 'upstream/master' into maintenance/v4-merge-master-20240119	2024-01-19 12:34:29 +01:00
Daniël de Kok	2891e27421	Merge pull request #13191 from explosion/maintenance/revert-parser-refactor Revert the parser refactor	2024-01-18 17:06:41 +01:00
Daniël de Kok	9972333ef9	Temporily xfail local remote storage test	2024-01-17 10:20:40 +01:00
Daniël de Kok	7351f6bbeb	Update thinc dependency to 9.0.0.dev4	2024-01-16 15:56:09 +01:00
Daniël de Kok	e2a3952de5	Add spacy.TextCatParametricAttention.v1 (#13201 ) * Add spacy.TextCatParametricAttention.v1 This layer provides is a simplification of the ensemble classifier that only uses paramteric attention. We have found empirically that with a sufficient amount of training data, using the ensemble classifier with BoW does not provide significant improvement in classifier accuracy. However, plugging in a BoW classifier does reduce GPU training and inference performance substantially, since it uses a GPU-only kernel. * Fix merge fallout	2024-01-02 10:03:06 +01:00
Daniël de Kok	7718886fa3	TransitionBasedParser.v2 in run example output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-12-21 11:14:35 +01:00
Daniël de Kok	7ebba86402	Add TextCatReduce.v1 (#13181 ) * Add TextCatReduce.v1 This is a textcat classifier that pools the vectors generated by a tok2vec implementation and then applies a classifier to the pooled representation. Three reductions are supported for pooling: first, max, and mean. When multiple reductions are enabled, the reductions are concatenated before providing them to the classification layer. This model is a generalization of the TextCatCNN model, which only supports mean reductions and is a bit of a misnomer, because it can also be used with transformers. This change also reimplements TextCatCNN.v2 using the new TextCatReduce.v1 layer. * Doc fixes Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fully specify `TextCatCNN` <-> `TextCatReduce` equivalence * Move TextCatCNN docs to legacy, in prep for moving to spacy-legacy * Add back a test for TextCatCNN.v2 * Replace TextCatCNN in pipe configurations and templates * Add an infobox to the `TextCatReduce` section with an `TextCatCNN` anchor * Add last reduction (`use_reduce_last`) * Remove non-working TextCatCNN Netlify redirect * Revert layer changes for the quickstart * Revert one more quickstart change * Remove unused import * Fix docstring * Fix setting name in error message --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-12-21 11:00:06 +01:00
Daniël de Kok	532225b955	Fix parser distillation test seed The test would sometimes fail. Rather than increasing test by increasing training iterations, use a known-good seed.	2023-12-21 10:06:28 +01:00
Daniël de Kok	7b689bde44	No need for `Literal` compat, since we only support >= 3.8	2023-12-21 09:47:38 +01:00
Daniël de Kok	57203fa0fc	Fix `TransitionBasedParser` version in transformer embeddings docs	2023-12-19 09:28:20 +01:00
Daniël de Kok	5e8bafa5bb	Bring back W401	2023-12-18 20:17:24 +01:00
Daniël de Kok	9b36729cbd	Fix Cython lints	2023-12-18 20:02:15 +01:00
Steven Crowther	764be103bc	update README to include links to GPU processing, LLM's, and the spaCy blog. (#13197 ) * Update README.md to include links for GPU processing, LLM, and spaCy's blog. * Create ojo4f3.md * corrected README to most current version with links to GPU processing, LLM's, and the spaCy blog. * Delete .github/contributors/ojo4f3.md * changed LLM icon Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Apply suggestions from code review --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-12-18 09:49:07 +01:00
Sofie Van Landeghem	56fc3bc0f3	Type documentation fixes for Doc (#13187 ) * correct char_span output type - can be None * unify type of exclude parameter * black * further fixes to from_dict and to_dict * formatting	2023-12-18 09:00:47 +01:00
Ines Montani	7df328fbfe	Update README.md [ci skip]	2023-12-12 10:19:57 +01:00
Ines Montani	8cfccdd2f8	Update links [ci skip]	2023-12-11 15:51:43 +01:00
Ines Montani	f78b91c03b	Update links [ci skip]	2023-12-11 15:51:01 +01:00
Daniël de Kok	42fe4edfd7	Add distillation tests with max cut size And fix endless loop when the max cut size is 0 or 1.	2023-12-08 20:38:01 +01:00
Daniël de Kok	e2591cda36	isort	2023-12-08 20:24:09 +01:00
Daniël de Kok	e5ec45cb7e	Revert "Merge the parser refactor into `v4` (#10940 )" This reverts commit `a183db3cef`.	2023-12-08 20:23:08 +01:00
Daniël de Kok	05803cfe76	Revert "Reimplement distillation with oracle cut size (#12214 )" This reverts commit `e27c60a702`.	2023-12-08 14:38:05 +01:00
Adriane Boyd	e467573550	Docs: update trf_data examples and pipeline design info (#13164 )	2023-12-04 15:15:54 +01:00
Daniël de Kok	da7ad97519	Update `TextCatBOW` to use the fixed `SparseLinear` layer (#13149 ) * Update `TextCatBOW` to use the fixed `SparseLinear` layer A while ago, we fixed the `SparseLinear` layer to use all available parameters: https://github.com/explosion/thinc/pull/754 This change updates `TextCatBOW` to `v3` which uses the new `SparseLinear_v2` layer. This results in a sizeable improvement on a text categorization task that was tested. While at it, this `spacy.TextCatBOW.v3` also adds the `length_exponent` option to make it possible to change the hidden size. Ideally, we'd just have an option called `length`. But the way that `TextCatBOW` uses hashes results in a non-uniform distribution of parameters when the length is not a power of two. * Replace TexCatBOW `length_exponent` parameter by `length` We now round up the length to the next power of two if it isn't a power of two. * Remove some tests for TextCatBOW.v2 * Fix missing import	2023-11-29 09:11:54 +01:00
Ines Montani	bf7c2ea99a	Add merch link [ci skip]	2023-11-22 12:55:00 +01:00
Ines Montani	8f69e56a5a	Add swag [ci skip]	2023-11-20 14:42:01 +01:00
Lise	b6e022381d	Feature/nn and fo language extensions (#13116 ) * add language extensions for norwegian nynorsk and faroese * update docstring for nn/examples.py * use relative imports * add fo and nn tokenizers to pytest fixtures * add unittests for fo and nn and fix bug in nn * remove module docstring from fo/__init__.py * add comments about example sentences' origin * add license information to faroese data credit * format unittests using black * add __init__ files to test/lang/nn and tests/lang/fo * fix import order and use relative imports in fo/__nit__.py and nn/__init__.py * Make the tests a bit more compact * Add fo and nn to website languages * Add note about jul. * Add "jul." as exception --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-11-20 07:49:59 +01:00
ajbond	9f2ce6bb00	Add Redfield NLP Nodes to the Spacy Universe (#13133 )	2023-11-17 09:48:02 +01:00
Madeesh Kannan	bd2c17e206	Warn about reloading dependencies after downloading models (#13081 ) * Update the "Missing factory" error message This accounts for model installations that took place during the current Python session. * Add a note about Jupyter notebooks * Move error to `spacy.cli.download` Add extra message for Jupyter sessions * Add additional note for interactive sessions * Remove note about `spacy-transformers` from error message * `isort` * Improve checks for colab (also helps displacy) * Update warning messages * Improve flow for multiple checks --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-11-10 08:05:07 +01:00
Adriane Boyd	513bbd5fa3	Add preferred use of build for package CLI (#13109 ) Build with `build` if available. Warn and fall back to previous `setup.py`-based builds if `build` build fails.	2023-11-08 17:35:24 +01:00
Ridge Kimani	2b8da84717	feat: add extra lexical attributes (#13106 ) Co-authored-by: Ridge Kimani <ridgekimani@gmail.com>	2023-11-08 17:29:11 +01:00
Adriane Boyd	0c25725359	Update Tokenizer.explain for special cases with whitespace (#13086 ) * Update Tokenizer.explain for special cases with whitespace Update `Tokenizer.explain` to skip special case matches if the exact text has not been matched due to intervening whitespace. Enable fuzzy `Tokenizer.explain` tests with additional whitespace normalization. * Add unit test for special cases with whitespace, xfail fuzzy tests again	2023-11-06 17:29:59 +01:00
Adriane Boyd	ff9ddb6a07	Unskip python 3.12 remote tests (#13110 )	2023-11-06 11:59:45 +01:00
Adriane Boyd	c096c5c0c9	Update for numpy 2.0 deprecations (#13103 ) - Replace `np.trapz` with vendored `trapezoid` from scipy - Replace `np.float_` with `np.float64`	2023-11-06 08:47:53 +01:00
Adriane Boyd	92f1d0a195	CI: Switch to stable python 3.12 and limit 3.11 runs (#13104 )	2023-11-03 15:46:03 +01:00
Raphael Mitsch	c4e2daf6ef	Fix displacy span stacking (#13068 ) * Fix displacy span stacking. * Format. Remove counter. * Remove test files. * Add unit test. Refactor to allow for unit test. * Fix off-by-one error in tests.	2023-11-02 12:02:18 +01:00
Sofie Van Landeghem	a804b83a4b	Update llm docs to clarify task-specific factories (#13082 ) * fix typo * add examples to specify custom model for task-specific factory	2023-10-31 22:07:07 +01:00
Sofie Van Landeghem	48248c62b6	Clarify EL example in docs (#13071 ) * add comment that pipeline is a custom one * add link to NEL tutorial * prettier * revert prettier reformat * revert prettier reformat (2) * fix typo Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-10-31 21:58:29 +01:00
Raphael Mitsch	0c15876502	Fix spancat typo. (#13095 )	2023-10-31 13:45:10 +01:00
Raphael Mitsch	9deaac9786	Add note in docs on `score_weight` config if using a non-default `spans_key` for SpanCat (#13093 ) * Add note on score_weight if using a non-default span_key for SpanCat. * Fix formatting. * Fix formatting. * Fix typo. * Use warning infobox. * Fix infobox formatting.	2023-10-30 17:02:08 +01:00
Sofie Van Landeghem	d717123819	Update LICENSE (#13078 )	2023-10-23 11:59:18 +02:00
Adriane Boyd	a89eae9283	Set version to v3.7.2 (#13066 )	2023-10-16 15:10:55 +02:00
Sofie Van Landeghem	699dd8b3b7	Update __all__ fields (#13063 ) * update all for pipeline.init * add all in training.init * add all in kb.init * alphabetically	2023-10-16 10:17:47 +02:00
Adriane Boyd	ea1befa8ff	Support Any comparisons for Token and Span (#13058 ) * Support Any comparisons for Token and Span * Preserve previous behavior for None	2023-10-12 11:53:33 +02:00
Raphael Mitsch	d72029d9c8	Add binary examples for Textcat task in `spacy-llm` (#13051 ) * Add examples for binary classification. * Fix example. * Remove binary textcat example. Format. * Rephrase.	2023-10-11 12:23:38 +02:00
Adriane Boyd	77c568e524	Restore spacy.cli.project API (#13053 ) * Restore spacy.cli.project API * Fix typing errors, add simple import test	2023-10-10 15:35:25 +02:00
Ines Montani	65e7bd54f5	Update usage sidebar and nav alert [ci skip]	2023-10-06 14:36:37 +02:00
Ines Montani	b83f1e3724	Inline displaCy visualizations in docs (#13050 ) [ci skip]	2023-10-06 14:22:43 +02:00
Raphael Mitsch	be29216fe2	Merge pull request #13044 from explosion/docs/llm_main Sync `master` with `docs/llm_main`	2023-10-05 16:10:19 +02:00
Raphael Mitsch	1162fcf099	Add Mistral mentions. (#13037 )	2023-10-05 14:44:38 +02:00
Raphael Mitsch	862f8254e8	Add docs on Azure OpenAI support in `spacy-llm` (#13043 ) * Add gpt-3.5-turbo-instruct to list of supported OpenAI models. * Update `spacy-llm` task argument docs w.r.t. task refactoring (#12995) * Update task arguments w.r.t. task refactoring in 0.5.0. * Add disclaimer w.r.t. gated models/Llama 2. * Update website/docs/api/large-language-models.mdx * Update website/docs/api/large-language-models.mdx * Update docs w.r.t. PaLM support. (#13018) * Add info on spacy.Azure.v1. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Attempt to fix netlify check fails. * Format.	2023-10-05 13:18:27 +02:00

1 2 3 4 5 ...

16225 Commits