Commit Graph

15900 Commits

Author SHA1 Message Date
royashcenazi
931a46308f Parsigs universe 3 (#12617)
* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

* added biomedical category

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
royashcenazi
7c49d251c7 parsigs universe (#12616)
* parsigs universe

* added model installation explanation in the description

* Update website/meta/universe.json

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* added model installement instruction in the code example

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
David Berenstein
81488fa88b chore: added adept-augmentations to the spacy universe (#12609)
* chore: added adept-augmentations to the spacy universe

* Apply suggestions from code review

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>

* Update universe.json

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
Patrick J. Burns
54d9198e62 Fix typo (#12615) 2023-05-12 09:55:22 +02:00
Patrick J. Burns
7ae4fc19a1 Add LatinCy models to universe.json (#12597)
* Add LatinCy models to universe.json

* Update website/meta/universe.json

Add install code for LatinCy models to 'code_example'

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update LatinCy ‘code_example’ in website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-12 09:55:22 +02:00
Adriane Boyd
2cfbc1209d In initialize only calculate current vectors hash if needed (#12607) 2023-05-12 09:55:22 +02:00
Adriane Boyd
42e5043816 Remove #egg from download URLs (#12567)
The current URLs will become invalid in pip 25.0. According to the pip
docs, the egg= URLs are currently only needed for editable VCS installs.
2023-05-12 09:55:22 +02:00
Kenneth Enevoldsen
4e1db35f6e Update inmemorylookupkb.mdx (#12586)
Example does not refer to the in memory lookup
2023-05-12 09:55:22 +02:00
Lj Miranda
9ec12fcfde Add spans in spacy benchmark (#12575)
* Add spans in spacy benchmark

The current implementation of spaCy benchmark accuracy / spacy evaluate
doesn't include the "spans" type, so calling the command doesn't render
the HTML displaCy file needed.

This PR attempts to fix that by creating a new parameter for "spans"
and calling the appropriate displaCy value.

* Reformat file with black

* Add tests for evaluate

* Fix spans -> span for displacy style

* Update test to check render instead

* Update source so mypy passes

* Add parser information to avoid warnings
2023-05-12 09:55:22 +02:00
Adriane Boyd
139368d9ce CI: Only run test suite once with thinc-apple-ops for macos python 3.11 (#12436)
* CI: Only run test suite once with thinc-apple-ops for macos python 3.11

* Adjust syntax

* Try alternate syntax

* Try alternate syntax

* Try alternate syntax
2023-05-12 09:55:22 +02:00
kadarakos
0de1f8bf73 Spancat speed improvement (#12577)
* avoid nesting then flattening

* mypy fix

* Apply suggestions from code review

* Add type for indices

* Run full matrix for mypy

* Add back modified type: ignore

* Revert "Run full matrix for mypy"

This reverts commit e218873d04.

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-12 09:55:22 +02:00
Victoria
1f8f910554 Add spacy-wasm to universe (#12572)
* add spacy-wasm to universe

* add tag
2023-05-12 09:55:22 +02:00
moxley01
e9945ccd04 add spacysee project (#12568) 2023-05-12 09:55:22 +02:00
Adriane Boyd
664a53ffbe CI: Disable Azure (#12560) 2023-05-12 09:55:22 +02:00
Adriane Boyd
e05b2ccc7c Add default option to MorphAnalysis.get (#12545)
* Add default to MorphAnalysis.get

Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.

* Restore test case
2023-05-12 09:55:22 +02:00
Adriane Boyd
357fdd4871 Load exceptions last in Tokenizer.from_bytes (#12553)
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.

The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-05-12 09:55:22 +02:00
Sofie Van Landeghem
7bf1db87ad fix typo (#12543) 2023-05-12 09:55:22 +02:00
TAN Long
b0e5aed5ed perf(REL_OP): Replace some token.children with token.rights or token.lefts (#12528)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-05-12 09:55:22 +02:00
TAN Long
6be67db59f docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 (#12531)
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-05-12 09:55:22 +02:00
andyjessen
18a2a88a95 Add category to spaCy project (#12506)
ScispaCy fits within biomedical domain. Consider adding this category.
2023-05-12 09:55:22 +02:00
Adriane Boyd
aea4a96f92
Set version to v3.5.2 (#12508) 2023-04-06 17:30:39 +02:00
Adriane Boyd
e4bbdf7b50
Merge pull request #12494 from adrianeboyd/backport/v3.5.2-1
Backports for v3.5.2
2023-04-06 16:18:59 +02:00
Madeesh Kannan
f66d55fe5b Docs: Fix rule-based matching example that expands named entities (#12495) 2023-04-06 11:48:04 +02:00
Edward
9fbb8ee912 Add more information to custom code docs (#12491)
* Add info to sections

* Update website/docs/usage/training.mdx

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-06 11:48:04 +02:00
Will Frey
314a7cea73 Fix invalid ConsoleLogger.v3 example config (#12498)
Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.
2023-04-06 11:48:04 +02:00
Edward
2fbd080a03 Add model-last saving mechanism to pretraining (#12459)
* Adjust pretrain command

* chane naming and add finally block

* Add unit test

* Add unit test assertions

* Update spacy/training/pretrain.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* change finally block

* Add to docs

* Update website/docs/usage/embeddings-transformers.mdx

* Add flag to skip saving model-last

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
bbf232e355 Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set (#12493)
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set

* Format
2023-04-03 15:28:52 +02:00
Adriane Boyd
0ec4dc5c29 Remove redundant strings.add for Doc.char_span (#12429) 2023-04-03 15:28:52 +02:00
Adriane Boyd
a5406a6c45 Allow cupy 12.0 for extras (#12490) 2023-04-03 15:28:52 +02:00
Adriane Boyd
57ee1212de Fix pickle for ngram suggester (#12486) 2023-04-03 15:28:52 +02:00
Ye Lei (叶磊)
b228875600 Allow passing a Span to displacy.parse_deps (#12477)
* Allow passing a Span to displacy.parse_deps

* Update docstring

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update API docs

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Raphael Mitsch
8d064872ff Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. (#12484) 2023-04-03 15:28:52 +02:00
kadarakos
26da226a39 Fix spancat-singlelabel score (#12469)
* debug argmax sort and add span scores

* add missing tests for spanscores
2023-04-03 15:28:52 +02:00
Edward
888332dfb2 Add info to stringstore and vocab (#12471) 2023-04-03 15:28:52 +02:00
Adriane Boyd
1b4a67bc54 Restrict github workflows to explosion (#12470) 2023-04-03 15:28:52 +02:00
sloev / Johannes Valbjørn
79dcef17f7 add spacy_onnx_sentiment_english to universe (#12422)
* add spacy_onnx_sentiment_english to universe

* rename to sentimental-onix

* fix comma json error

* fix typo

* typo fix

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* mention need to download model before example works

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Prajakta Darade
0ecbeff1a6 corrected example code (#12466) 2023-04-03 15:28:52 +02:00
kadarakos
4380d750f9 add explanation about overwriting behaviour (#12464)
* add explanation about overwriting behaviour

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.mdx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* format

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
2953e7b7ce Support floret for PretrainVectors (#12435)
* Support floret for PretrainVectors

* Format
2023-04-03 15:28:52 +02:00
Ines Montani
d2d9e9e139 Add user survey alert to the top (#12452)
* Add user survey alert to the top

* Shorter

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
f1a42b6fcc CI: Separate spacy universe validation into a separate workflow (#12440)
* Separate spacy universe validation into a separate workflow

* Fix new workflow name
2023-04-03 15:28:52 +02:00
Adriane Boyd
f9c0220ea5 CI: Switch PR back to paths-ignore (#12438)
Switch PR tests back to paths-ignore but include changes to `.github`
for all PRs rather than trying to figure out complicated
includes+excludes.  Changes to `.github` are relatively rare and should
not be a huge burden for the CI.
2023-04-03 15:28:52 +02:00
Adriane Boyd
6183906a0b Remove autoblack workflow (#12437)
Now that all PRs have `black` formatting validation, we no longer need the
autoblack workflow.
2023-04-03 15:28:52 +02:00
Raphael Mitsch
bd0768c05c Fix EL failure with sentence-crossing entities (#12398)
* Add test reproducing EL failure in sentence-crossing entities.

* Format.

* Draft fix.

* Format.

* Fix case for len(ent.sents) == 1.

* Format.

* Format.

* Format.

* Fix mypy error.

* Merge EL sentence crossing tests.

* Remove unneeded sentencizer component.

* Fix or ignore mypy issues in test.

* Simplify ent.sents handling.

* Format. Update assert in ent.sents handling.

* Small rewrite

---------

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
be644caa13 Fix --verbose for spacy find-threshold (#12418) 2023-04-03 15:28:52 +02:00
Adriane Boyd
7880da952b CI: Add all paths before excluding patterns (#12419) 2023-04-03 15:28:52 +02:00
Raphael Mitsch
545218a7d9 Fix sentence indexing bug in Span.sents (#12405)
* Add test for partial sentences in ent.sents.

* Removed unneeded import.

* Format. Simplify code.
2023-04-03 15:28:52 +02:00
Adriane Boyd
d00e58d1ac CI: Move CLI tests to ubuntu for speed (#12409) 2023-04-03 15:28:52 +02:00
Adriane Boyd
9ca67dc539 Fix thinc-apple-ops test to run for python 3.11 (#12408) 2023-04-03 15:28:52 +02:00
Adriane Boyd
ed83cafe46 CI: Move universe validation to validate job (#12406)
* CI: Move universe validation to validate job

* Fix indentation

* Update step name
2023-04-03 15:28:52 +02:00