Commit Graph

15907 Commits

Author SHA1 Message Date
Adriane Boyd
b8d40cae3e Support overriding registered functions in configs (#12623)
Support overriding registered functions in configs. Previously the registry name was parsed as a section name rather than as a registry name.
2023-06-28 10:03:27 +02:00
Adriane Boyd
1f663b7c33 Address issues with source with component names and replacing listeners (#12701)
When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object:

* component.name
* component.listener_map / component.listening_components for tok2vec and transformer

When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that:

* find_listeners relies on component.name to set the name in the listener_map
* replace_listeners relies on listener_map to determine how to modify the configs

In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated.

In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.
2023-06-28 10:03:27 +02:00
Adriane Boyd
0b190c5d42 Address numpy 1.25 deprecations in test suite (#12684)
* Address upcoming numpy v1.25 deprecations in test suite

* Temporarily test most recent numpy prerelease in CI

* Revert "Temporarily test most recent numpy prerelease in CI"

This reverts commit d75a66e55e.
2023-06-28 10:03:27 +02:00
Basile Dura
6c2f601812 build: bump typer version to accept >=0.3<0.10 (#12631) 2023-06-28 09:54:16 +02:00
Adriane Boyd
512241e124
Set version to v3.5.3 (#12628) 2023-05-12 11:50:08 +02:00
Adriane Boyd
424e917c6c
Merge pull request #12626 from adrianeboyd/backport/v3.5.3-1
Backports for v3.5.3
2023-05-12 11:18:12 +02:00
Kenneth Enevoldsen
9beaec6a03 docs: remove invalid huggingface-hub push argument (#12624) 2023-05-12 09:55:22 +02:00
royashcenazi
931a46308f Parsigs universe 3 (#12617)
* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

* added biomedical category

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
royashcenazi
7c49d251c7 parsigs universe (#12616)
* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
David Berenstein
81488fa88b chore: added adept-augmentations to the spacy universe (#12609)
* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
Patrick J. Burns
54d9198e62 Fix typo (#12615) 2023-05-12 09:55:22 +02:00
Patrick J. Burns
7ae4fc19a1 Add LatinCy models to universe.json (#12597)
* Add LatinCy models to universe.json

* Update website/meta/universe.json

Add install code for LatinCy models to 'code_example'

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update LatinCy ‘code_example’ in website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-12 09:55:22 +02:00
Adriane Boyd
2cfbc1209d In initialize only calculate current vectors hash if needed (#12607) 2023-05-12 09:55:22 +02:00
Adriane Boyd
42e5043816 Remove #egg from download URLs (#12567)
The current URLs will become invalid in pip 25.0. According to the pip
docs, the egg= URLs are currently only needed for editable VCS installs.
2023-05-12 09:55:22 +02:00
Kenneth Enevoldsen
4e1db35f6e Update inmemorylookupkb.mdx (#12586)
Example does not refer to the in memory lookup
2023-05-12 09:55:22 +02:00
Lj Miranda
9ec12fcfde Add spans in spacy benchmark (#12575)
* Add spans in spacy benchmark

The current implementation of spaCy benchmark accuracy / spacy evaluate
doesn't include the "spans" type, so calling the command doesn't render
the HTML displaCy file needed.

This PR attempts to fix that by creating a new parameter for "spans"
and calling the appropriate displaCy value.

* Reformat file with black

* Add tests for evaluate

* Fix spans -> span for displacy style

* Update test to check render instead

* Update source so mypy passes

* Add parser information to avoid warnings
2023-05-12 09:55:22 +02:00
Adriane Boyd
139368d9ce CI: Only run test suite once with thinc-apple-ops for macos python 3.11 (#12436)
* CI: Only run test suite once with thinc-apple-ops for macos python 3.11

* Adjust syntax

* Try alternate syntax

* Try alternate syntax

* Try alternate syntax
2023-05-12 09:55:22 +02:00
kadarakos
0de1f8bf73 Spancat speed improvement (#12577)
* avoid nesting then flattening

* mypy fix

* Apply suggestions from code review

* Add type for indices

* Run full matrix for mypy

* Add back modified type: ignore

* Revert "Run full matrix for mypy"

This reverts commit e218873d04.

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-12 09:55:22 +02:00
Victoria
1f8f910554 Add spacy-wasm to universe (#12572)
* add spacy-wasm to universe

* add tag
2023-05-12 09:55:22 +02:00
moxley01
e9945ccd04 add spacysee project (#12568) 2023-05-12 09:55:22 +02:00
Adriane Boyd
664a53ffbe CI: Disable Azure (#12560) 2023-05-12 09:55:22 +02:00
Adriane Boyd
e05b2ccc7c Add default option to MorphAnalysis.get (#12545)
* Add default to MorphAnalysis.get

Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.

* Restore test case
2023-05-12 09:55:22 +02:00
Adriane Boyd
357fdd4871 Load exceptions last in Tokenizer.from_bytes (#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.

The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-05-12 09:55:22 +02:00
Sofie Van Landeghem
7bf1db87ad fix typo (#12543) 2023-05-12 09:55:22 +02:00
TAN Long
b0e5aed5ed perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-05-12 09:55:22 +02:00
TAN Long
6be67db59f docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-05-12 09:55:22 +02:00
andyjessen
18a2a88a95 Add category to spaCy project (#12506)
ScispaCy fits within biomedical domain. Consider adding this category.
2023-05-12 09:55:22 +02:00
Adriane Boyd
aea4a96f92
Set version to v3.5.2 (#12508) 2023-04-06 17:30:39 +02:00
Adriane Boyd
e4bbdf7b50
Merge pull request #12494 from adrianeboyd/backport/v3.5.2-1
Backports for v3.5.2
2023-04-06 16:18:59 +02:00
Madeesh Kannan
f66d55fe5b Docs: Fix rule-based matching example that expands named entities (#12495) 2023-04-06 11:48:04 +02:00
Edward
9fbb8ee912 Add more information to custom code docs (#12491)
* Add info to sections

* Update website/docs/usage/training.mdx

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-06 11:48:04 +02:00
Will Frey
314a7cea73 Fix invalid ConsoleLogger.v3 example config (#12498)
Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.
2023-04-06 11:48:04 +02:00
Edward
2fbd080a03 Add model-last saving mechanism to pretraining (#12459)
* Adjust pretrain command

* chane naming and add finally block

* Add unit test

* Add unit test assertions

* Update spacy/training/pretrain.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* change finally block

* Add to docs

* Update website/docs/usage/embeddings-transformers.mdx

* Add flag to skip saving model-last

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
bbf232e355 Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493)
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set

* Format
2023-04-03 15:28:52 +02:00
Adriane Boyd
0ec4dc5c29 Remove redundant strings.add for Doc.char_span (#12429) 2023-04-03 15:28:52 +02:00
Adriane Boyd
a5406a6c45 Allow cupy 12.0 for extras (#12490) 2023-04-03 15:28:52 +02:00
Adriane Boyd
57ee1212de Fix pickle for ngram suggester (#12486) 2023-04-03 15:28:52 +02:00
Ye Lei (叶磊)
b228875600 Allow passing a Span to displacy.parse_deps (#12477)
* Allow passing a Span to displacy.parse_deps

* Update docstring

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update API docs

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Raphael Mitsch
8d064872ff Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. (#12484) 2023-04-03 15:28:52 +02:00
kadarakos
26da226a39 Fix spancat-singlelabel score (#12469)
* debug argmax sort and add span scores

* add missing tests for spanscores
2023-04-03 15:28:52 +02:00
Edward
888332dfb2 Add info to stringstore and vocab (#12471) 2023-04-03 15:28:52 +02:00
Adriane Boyd
1b4a67bc54 Restrict github workflows to explosion (#12470) 2023-04-03 15:28:52 +02:00
sloev / Johannes Valbjørn
79dcef17f7 add spacy_onnx_sentiment_english to universe (#12422)
* add spacy_onnx_sentiment_english to universe

* rename to sentimental-onix

* fix comma json error

* fix typo

* typo fix

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* mention need to download model before example works

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Prajakta Darade
0ecbeff1a6 corrected example code (#12466) 2023-04-03 15:28:52 +02:00
kadarakos
4380d750f9 add explanation about overwriting behaviour (#12464)
* add explanation about overwriting behaviour

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* format

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
2953e7b7ce Support floret for PretrainVectors (#12435)
* Support floret for PretrainVectors

* Format
2023-04-03 15:28:52 +02:00
Ines Montani
d2d9e9e139 Add user survey alert to the top (#12452)
* Add user survey alert to the top

* Shorter

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
f1a42b6fcc CI: Separate spacy universe validation into a separate workflow (#12440)
* Separate spacy universe validation into a separate workflow

* Fix new workflow name
2023-04-03 15:28:52 +02:00
Adriane Boyd
f9c0220ea5 CI: Switch PR back to paths-ignore (#12438)
Switch PR tests back to paths-ignore but include changes to `.github`
for all PRs rather than trying to figure out complicated
includes+excludes.  Changes to `.github` are relatively rare and should
not be a huge burden for the CI.
2023-04-03 15:28:52 +02:00
Adriane Boyd
6183906a0b Remove autoblack workflow (#12437)
Now that all PRs have `black` formatting validation, we no longer need the
autoblack workflow.
2023-04-03 15:28:52 +02:00