spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-02 18:06:46 +03:00

Author	SHA1	Message	Date
Adriane Boyd	6b4f774418	Set version to v3.7.0 (#13028 )	2023-09-28 21:27:42 +02:00
Adriane Boyd	78504c25a5	CI: Add python 3.12.0rc2	2023-09-28 17:12:42 +02:00
Adriane Boyd	467c82439e	Always use tqdm with `disable=None` `tqdm` can cause deadlocks in the test suite if enabled.	2023-09-28 17:12:42 +02:00
Adriane Boyd	b4990395f9	Update mypy requirements	2023-09-28 17:12:42 +02:00
Adriane Boyd	76d94b31f2	Branch on python 3.12+ shutil.rmtree in make_tempdir	2023-09-28 17:09:41 +02:00
Adriane Boyd	1adf79414e	Set cython profiling default to True for <3.12, False for >=3.12	2023-09-28 17:09:41 +02:00
Adriane Boyd	538304948e	Remove profile=True from currently profiled cython	2023-09-28 17:09:41 +02:00
Adriane Boyd	55614d6799	Add profile=False to currently unprofiled cython	2023-09-28 17:09:41 +02:00
Adriane Boyd	36201cb6a1	Merge pull request #13027 from adrianeboyd/chore/update-develop-from-master-v3.7-1 Update develop from master for v3.7	2023-09-28 16:01:40 +02:00
Adriane Boyd	406794a081	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.7-1	2023-09-28 15:09:06 +02:00
Daniël de Kok	beda27a91e	Load the cli module lazily for spacy.info (#12962 ) * Load the cli module lazily for spacy.info This avoids that the `spacy` module cannot be imported when the users chooses not to install `typer`/`requests`. * Add test --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-09-28 11:36:44 +02:00
Sergiu Nisioi	6255e38695	Adding rolegal model to the spaCy universe (#13017 ) * adding rolegal model to the spaCy universe * Fix formatting * Use raw URL * update image url and example * fix pip and update url to raw * okay, let's add thumb instead of image 🐙 * Update website/meta/universe.json --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-09-28 11:06:50 +02:00
Madeesh Kannan	b4501db6f8	Update emoji library in rule-based matcher example (#13014 )	2023-09-25 18:20:30 +02:00
Adriane Boyd	ff4215f1c7	Drop support for python 3.6 (#13009 ) * Drop support for python 3.6 * Update docs	2023-09-25 14:48:38 +02:00
Adriane Boyd	935a5455b6	Docs: add new tag for evaluate CLI --spans-keys (#13013 )	2023-09-25 11:49:28 +02:00
Ikko Eltociear Ashimine	ed8c11e2aa	Fix typo in lemmatizer.py (#13003 ) specfic -> specific	2023-09-25 11:44:35 +02:00
Eliana Vornov	4e3360ad12	add --spans-key option for CLI spancat evaluation (#12981 ) * add span key option for CLI evaluation * Rephrase CLI help to refer to Doc.spans instead of spancat * Rephrase docs to refer to Doc.spans instead of spancat --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-09-25 11:25:41 +02:00
Raphael Mitsch	bef9f63e13	Add gpt-3.5-turbo-instruct to list of supported OpenAI models.	2023-09-21 11:28:58 +02:00
Raphael Mitsch	830eba5426	Merge pull request #12994 from explosion/docs/llm_main Synch `llm_develop` with `llm_main`	2023-09-20 10:05:40 +02:00
Raphael Mitsch	163ec6fba8	Merge pull request #12993 from explosion/master Synch `llm_main` with `master`	2023-09-20 10:04:35 +02:00
Sofie Van Landeghem	8f0d6b0a8c	Fix in BertTokenizer docs (#12955 ) * fix BertWordPieceTokenizer constructor call * fix * Update website/docs/usage/linguistic-features.mdx --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-09-13 13:21:58 +02:00
Adriane Boyd	36d4767aca	Skip project remotes test for python 3.12 (#12980 ) `weasel` (using `cloudpathlib`) does not currently support remote paths for python 3.12.	2023-09-13 13:16:05 +02:00
Sofie Van Landeghem	013762be41	Few spacy-llm doc fixes (#12969 ) * fix construction example * shorten task-specific factory list * small edits to HF models * small edit to API models * typo * fix space Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-09-08 11:35:38 +02:00
Sofie Van Landeghem	def7013eec	Docs for spacy-llm 0.5.0 (#12968 ) * Update incorrect example config. (#12893) * spacy-llm docs cleanup (#12945) * Shorten NER section * fix template references * simplify sections * set temperature to 0.0 in examples * condense model information * fix parameters for REST models * set temperature to 0.0 * spelling fix * trigger preview * fix quotes * add small note on noop.v1 * move up example noop config * set appropriate model example configs * explain config * fix Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * Docs for ner.v3 and spancat.v3 spacy-llm tasks (#12949) * formatting * update usage table with NER.v3 * fix typo in links * v3 overview of parameters * add spancat.v3 * add further v3 explanations * remove TODO comment * few more small fixes * Add doc section on LLM + task factories (#12905) * Add section on LLM + task factories. * Apply suggestions from code review --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * add default config to openai models (#12961) * Docs for spacy-llm 0.5.0 (#12967) * simplify Python example * simplify Python example * Refer only to latest OpenAI model versions from usage doc * Typo fix Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * clarify accuracy claim --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-09-08 10:25:14 +02:00
Magdalena Aniol	cc78847688	fix training.batch_size example (#12963 )	2023-09-06 16:38:13 +02:00
Sofie Van Landeghem	6d1f6d9a23	Fix LLM usage example (#12950 ) * fix usage example * revert back to v2 to allow hot fix on main	2023-09-04 09:05:50 +02:00
Sofie Van Landeghem	5c1f9264c2	fix typo in link (#12948 ) * fix typo in link * fix REL.v1 parameter	2023-09-01 13:47:20 +02:00
David Berenstein	065ead4eed	updated `add_pipe` docs (#12947 )	2023-09-01 11:05:36 +02:00
vincent d warmerdam	3e4264899c	Update large-language-models.mdx (#12944 )	2023-08-30 11:58:14 +02:00
Ines Montani	52758e1afa	Add headers to netlify.toml [ci skip]	2023-08-30 11:55:23 +02:00
Vinit Ravishankar	c2303858e6	Documentation for spacy-curated-transformers (#12677 ) * initial * initial documentation run * fix typo * Remove mentions of Torchscript and quantization Both are disabled in the initial release of `spacy-curated-transformers`. * Fix `piece_encoder` entries * Remove `spacy-transformers`-specific warning * Fix duplicate entries in tables * Doc fixes Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Remove type aliases * Fix copy-paste typo * Change `debug pieces` version tag to `3.7` * Set curated transformers API version to `3.7` * Fix transformer listener naming * Add docs for `init fill-config-transformer` * Update CLI command invocation syntax * Update intro section of the pipeline component docs * Fix source URL * Add a note to the architectures section about the `init fill-config-transformer` CLI command * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update CLI command name, args * Remove hyphen from the `curated-transformers.mdx` filename * Fix links * Remove placeholder text * Add text to the model/tokenizer loader sections * Fill in the `DocTransformerOutput` section * Formatting fixes * Add curated transformer page to API docs sidebar * More formatting fixes * Remove TODO comment * Remove outdated info about default config * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Add link to HF model hub * `prettier` --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-08-29 17:52:16 +02:00
PD Hall	d8a32c1050	docs: fix ngram_range_suggester max_size description (#12939 )	2023-08-29 11:10:58 +02:00
Sofie Van Landeghem	869cc4ab0b	warn when an unsupported/unknown key is given to the dependency matcher (#12928 )	2023-08-22 09:03:35 +02:00
Connor Brinton	6dd56868de	📝 Fix formula for receptive field in docs (#12918 ) SpaCy's HashEmbedCNN layer performs convolutions over tokens to produce contextualized embeddings using a `MaxoutWindowEncoder` layer. These convolutions are implemented using Thinc's `expand_window` layer, which concatenates `window_size` neighboring sequence items on either side of the sequence item being processed. This is repeated across `depth` convolutional layers. For example, consider the sequence "ABCDE" and a `MaxoutWindowEncoder` layer with a context window of 1 and a depth of 2. We'll focus on the token "C". We can visually represent the contextual embedding produced for "C" as: ```mermaid flowchart LR A0(A<sub>0</sub>) B0(B<sub>0</sub>) C0(C<sub>0</sub>) D0(D<sub>0</sub>) E0(E<sub>0</sub>) B1(B<sub>1</sub>) C1(C<sub>1</sub>) D1(D<sub>1</sub>) C2(C<sub>2</sub>) A0 --> B1 B0 --> B1 C0 --> B1 B0 --> C1 C0 --> C1 D0 --> C1 C0 --> D1 D0 --> D1 E0 --> D1 B1 --> C2 C1 --> C2 D1 --> C2 ``` Described in words, this graph shows that before the first layer of the convolution, the "receptive field" centered at each token consists only of that same token. That is to say, that we have a receptive field of 1. The first layer of the convolution adds one neighboring token on either side to the receptive field. Since this is done on both sides, the receptive field increases by 2, giving the first layer a receptive field of 3. The second layer of the convolutions adds an _additional_ neighboring token on either side to the receptive field, giving a final receptive field of 5. However, this doesn't match the formula currently given in the docs, which read: > The receptive field of the CNN will be > `depth * (window_size * 2 + 1)`, so a 4-layer network with a window > size of `2` will be sensitive to 20 words at a time. Substituting in our depth of 2 and window size of 1, this formula gives us a receptive field of: ``` depth * (window_size * 2 + 1) = 2 * (1 * 2 + 1) = 2 * (2 + 1) = 2 * 3 = 6 ``` This not only doesn't match our computations from above, it's also an even number! This is suspicious, since the receptive field is supposed to be centered on a token, and not between tokens. Generally, this formula results in an even number for any even value of `depth`. The error in this formula is that the adjustment for the center token is multiplied by the depth, when it should occur only once. The corrected formula, `depth * window_size * 2 + 1`, gives the correct value for our small example from above: ``` depth * window_size * 2 + 1 = 2 * 1 * 2 + 1 = 4 + 1 = 5 ``` These changes update the docs to correct the receptive field formula and the example receptive field size.	2023-08-21 10:52:32 +02:00
Adriane Boyd	198488ee86	Extend to weasel v0.3 (#12908 ) * Extend to weasel v0.3 * Clean up unused imports in test_cli	2023-08-16 17:36:53 +02:00
Adriane Boyd	76a9f9c6c6	Docs: clarify abstract spacy.load examples (#12889 )	2023-08-16 17:28:34 +02:00
William Mattingly	64b8ee2dbe	Update universe.json (#12904 ) * Update universe.json added hobbit-spacy to the universe json * Update universe.json removed displacy from hobbit-spacy and added a default text.	2023-08-14 16:44:14 +02:00
denizcodeyaa	d50b8d51e2	Update examples.py (#12895 ) Add: example sentences to improve the Turkish model. Let's get the tr_web_core_sm out in the the world yaa	2023-08-11 15:38:06 +02:00
Adriane Boyd	6a4aa43164	Extend to thinc v8.2 (#12897 )	2023-08-11 13:05:46 +02:00
Adriane Boyd	9622c11529	Extend to weasel v0.2 (#12902 )	2023-08-11 10:59:51 +02:00
Adriane Boyd	6ef29c4115	Merge pull request #12901 from adrianeboyd/feature/spacy-transformers-v1.3-revert Revert "Extend to spacy-transformers v1.3.x (#12877)"	2023-08-10 16:43:10 +02:00
Adriane Boyd	060241a8d5	Revert "Extend to spacy-transformers v1.3.x (#12877 )" This reverts commit `e5773e0c69`.	2023-08-10 11:42:09 +02:00
Adriane Boyd	458bc5f45c	Set version to v3.6.1 (#12892 )	2023-08-08 15:04:13 +02:00
Adriane Boyd	c4e378df97	Update CuPy extras (#12890 ) * Add `cuda12x` for `cupy-cuda12x`. * Drop `cuda-autodetect` from quickstart, set default to `cuda11x` instead.	2023-08-08 12:58:28 +02:00
Adriane Boyd	245e2ddc25	Allow pydantic v2 using transitional v1 support (#12888 )	2023-08-08 11:27:28 +02:00
Adriane Boyd	45af8a5dcf	Update br tags (#12882 ) * Fix displacy br tag * Prefer <br>, also update package CLI	2023-08-04 10:52:41 +02:00
Sofie Van Landeghem	3b7faf4f5e	fix (#12881 )	2023-08-03 08:37:43 +02:00
Arman Mohammadi	07407e07ab	fix the regular expression matching on the full text (#12883 ) There was a mistake in the regex pattern which caused not matching all the desired tokens. The problem was that when we use r string literal prefix to suppose a raw text, we should not use two backslashes to demonstrate a backslash.	2023-08-02 16:52:26 +02:00
Adriane Boyd	e5773e0c69	Extend to spacy-transformers v1.3.x (#12877 )	2023-08-02 09:35:16 +02:00
Sofie Van Landeghem	0737443096	feat: add example stubs (3) (#12801 ) * feat: add example stubs * fix: add required annotations * fix: mypy issues * fix: use Py36-compatible Portocol * Minor reformatting * adding further type specifications and removing internal methods * black formatting * widen type to iterable * add private methods that are being used by the built-in convertors * revert changes to corpus.py * fixes * fixes * fix typing of PlainTextCorpus --------- Co-authored-by: Basile Dura <basile@bdura.me> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-08-02 08:15:12 +02:00

... 2 3 4 5 6 ...

16185 Commits