royashcenazi
931a46308f
Parsigs universe 3 ( #12617 )
...
* parsigs universe
* added model installation explanation in the description
* Update website/meta/universe.json
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
* added model installement instruction in the code example
* added biomedical category
---------
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
royashcenazi
7c49d251c7
parsigs universe ( #12616 )
...
* parsigs universe
* added model installation explanation in the description
* Update website/meta/universe.json
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
* added model installement instruction in the code example
---------
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
David Berenstein
81488fa88b
chore: added adept-augmentations to the spacy universe ( #12609 )
...
* chore: added adept-augmentations to the spacy universe
* Apply suggestions from code review
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
* Update universe.json
---------
Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-05-12 09:55:22 +02:00
Patrick J. Burns
54d9198e62
Fix typo ( #12615 )
2023-05-12 09:55:22 +02:00
Patrick J. Burns
7ae4fc19a1
Add LatinCy models to universe.json ( #12597 )
...
* Add LatinCy models to universe.json
* Update website/meta/universe.json
Add install code for LatinCy models to 'code_example'
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update LatinCy ‘code_example’ in website/meta/universe.json
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-12 09:55:22 +02:00
Adriane Boyd
2cfbc1209d
In initialize only calculate current vectors hash if needed ( #12607 )
2023-05-12 09:55:22 +02:00
Adriane Boyd
42e5043816
Remove #egg from download URLs ( #12567 )
...
The current URLs will become invalid in pip 25.0. According to the pip
docs, the egg= URLs are currently only needed for editable VCS installs.
2023-05-12 09:55:22 +02:00
Kenneth Enevoldsen
4e1db35f6e
Update inmemorylookupkb.mdx ( #12586 )
...
Example does not refer to the in memory lookup
2023-05-12 09:55:22 +02:00
Lj Miranda
9ec12fcfde
Add spans in spacy benchmark ( #12575 )
...
* Add spans in spacy benchmark
The current implementation of spaCy benchmark accuracy / spacy evaluate
doesn't include the "spans" type, so calling the command doesn't render
the HTML displaCy file needed.
This PR attempts to fix that by creating a new parameter for "spans"
and calling the appropriate displaCy value.
* Reformat file with black
* Add tests for evaluate
* Fix spans -> span for displacy style
* Update test to check render instead
* Update source so mypy passes
* Add parser information to avoid warnings
2023-05-12 09:55:22 +02:00
Adriane Boyd
139368d9ce
CI: Only run test suite once with thinc-apple-ops for macos python 3.11 ( #12436 )
...
* CI: Only run test suite once with thinc-apple-ops for macos python 3.11
* Adjust syntax
* Try alternate syntax
* Try alternate syntax
* Try alternate syntax
2023-05-12 09:55:22 +02:00
kadarakos
0de1f8bf73
Spancat speed improvement ( #12577 )
...
* avoid nesting then flattening
* mypy fix
* Apply suggestions from code review
* Add type for indices
* Run full matrix for mypy
* Add back modified type: ignore
* Revert "Run full matrix for mypy"
This reverts commit e218873d04
.
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-05-12 09:55:22 +02:00
Victoria
1f8f910554
Add spacy-wasm to universe ( #12572 )
...
* add spacy-wasm to universe
* add tag
2023-05-12 09:55:22 +02:00
moxley01
e9945ccd04
add spacysee project ( #12568 )
2023-05-12 09:55:22 +02:00
Adriane Boyd
664a53ffbe
CI: Disable Azure ( #12560 )
2023-05-12 09:55:22 +02:00
Adriane Boyd
e05b2ccc7c
Add default option to MorphAnalysis.get ( #12545 )
...
* Add default to MorphAnalysis.get
Similar to `dict`, allow a `default` option for `MorphAnalysis.get` for
the user to provide a default return value if the field is not found.
The default return value remains `[]`, which is not the same as
`dict.get`, but is already established as this method's default return
value with the return type `List[str]`. However the new `default` option
does not enforce that the user-provided default is actually `List[str]`.
* Restore test case
2023-05-12 09:55:22 +02:00
Adriane Boyd
357fdd4871
Load exceptions last in Tokenizer.from_bytes ( #12553 )
...
In `Tokenizer.from_bytes`, the exceptions should be loaded last so that
they are only processed once as part of loading the model.
The exceptions are tokenized as phrase matcher patterns in the
background and the internal tokenization needs to be synced with all the
remaining tokenizer settings. If the exceptions are not loaded last,
there are speed regressions for `Tokenizer.from_bytes/disk` vs.
`Tokenizer.add_special_case` as the caches are reloaded more than
necessary during deserialization.
2023-05-12 09:55:22 +02:00
Sofie Van Landeghem
7bf1db87ad
fix typo ( #12543 )
2023-05-12 09:55:22 +02:00
TAN Long
b0e5aed5ed
perf(REL_OP): Replace some token.children with token.rights or token.lefts ( #12528 )
...
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-05-12 09:55:22 +02:00
TAN Long
6be67db59f
docs(REL_OP): modify docs for REL_OPs to match Semgrex's update on CoreNLP v4.5.2 ( #12531 )
...
Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-05-12 09:55:22 +02:00
andyjessen
18a2a88a95
Add category to spaCy project ( #12506 )
...
ScispaCy fits within biomedical domain. Consider adding this category.
2023-05-12 09:55:22 +02:00
Adriane Boyd
aea4a96f92
Set version to v3.5.2 ( #12508 )
2023-04-06 17:30:39 +02:00
Adriane Boyd
e4bbdf7b50
Merge pull request #12494 from adrianeboyd/backport/v3.5.2-1
...
Backports for v3.5.2
2023-04-06 16:18:59 +02:00
Madeesh Kannan
f66d55fe5b
Docs
: Fix rule-based matching example that expands named entities (#12495 )
2023-04-06 11:48:04 +02:00
Edward
9fbb8ee912
Add more information to custom code docs ( #12491 )
...
* Add info to sections
* Update website/docs/usage/training.mdx
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-06 11:48:04 +02:00
Will Frey
314a7cea73
Fix invalid ConsoleLogger.v3 example config ( #12498 )
...
Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.
2023-04-06 11:48:04 +02:00
Edward
2fbd080a03
Add model-last saving mechanism to pretraining ( #12459 )
...
* Adjust pretrain command
* chane naming and add finally block
* Add unit test
* Add unit test assertions
* Update spacy/training/pretrain.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* change finally block
* Add to docs
* Update website/docs/usage/embeddings-transformers.mdx
* Add flag to skip saving model-last
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
bbf232e355
Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set ( #12493 )
...
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set
* Format
2023-04-03 15:28:52 +02:00
Adriane Boyd
0ec4dc5c29
Remove redundant strings.add for Doc.char_span ( #12429 )
2023-04-03 15:28:52 +02:00
Adriane Boyd
a5406a6c45
Allow cupy 12.0 for extras ( #12490 )
2023-04-03 15:28:52 +02:00
Adriane Boyd
57ee1212de
Fix pickle for ngram suggester ( #12486 )
2023-04-03 15:28:52 +02:00
Ye Lei (叶磊)
b228875600
Allow passing a Span to displacy.parse_deps ( #12477 )
...
* Allow passing a Span to displacy.parse_deps
* Update docstring
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update API docs
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Raphael Mitsch
8d064872ff
Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. ( #12484 )
2023-04-03 15:28:52 +02:00
kadarakos
26da226a39
Fix spancat-singlelabel score ( #12469 )
...
* debug argmax sort and add span scores
* add missing tests for spanscores
2023-04-03 15:28:52 +02:00
Edward
888332dfb2
Add info to stringstore and vocab ( #12471 )
2023-04-03 15:28:52 +02:00
Adriane Boyd
1b4a67bc54
Restrict github workflows to explosion ( #12470 )
2023-04-03 15:28:52 +02:00
sloev / Johannes Valbjørn
79dcef17f7
add spacy_onnx_sentiment_english to universe ( #12422 )
...
* add spacy_onnx_sentiment_english to universe
* rename to sentimental-onix
* fix comma json error
* fix typo
* typo fix
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* mention need to download model before example works
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Prajakta Darade
0ecbeff1a6
corrected example code ( #12466 )
2023-04-03 15:28:52 +02:00
kadarakos
4380d750f9
add explanation about overwriting behaviour ( #12464 )
...
* add explanation about overwriting behaviour
* Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* format
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
2953e7b7ce
Support floret for PretrainVectors ( #12435 )
...
* Support floret for PretrainVectors
* Format
2023-04-03 15:28:52 +02:00
Ines Montani
d2d9e9e139
Add user survey alert to the top ( #12452 )
...
* Add user survey alert to the top
* Shorter
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
f1a42b6fcc
CI: Separate spacy universe validation into a separate workflow ( #12440 )
...
* Separate spacy universe validation into a separate workflow
* Fix new workflow name
2023-04-03 15:28:52 +02:00
Adriane Boyd
f9c0220ea5
CI: Switch PR back to paths-ignore ( #12438 )
...
Switch PR tests back to paths-ignore but include changes to `.github`
for all PRs rather than trying to figure out complicated
includes+excludes. Changes to `.github` are relatively rare and should
not be a huge burden for the CI.
2023-04-03 15:28:52 +02:00
Adriane Boyd
6183906a0b
Remove autoblack workflow ( #12437 )
...
Now that all PRs have `black` formatting validation, we no longer need the
autoblack workflow.
2023-04-03 15:28:52 +02:00
Raphael Mitsch
bd0768c05c
Fix EL failure with sentence-crossing entities ( #12398 )
...
* Add test reproducing EL failure in sentence-crossing entities.
* Format.
* Draft fix.
* Format.
* Fix case for len(ent.sents) == 1.
* Format.
* Format.
* Format.
* Fix mypy error.
* Merge EL sentence crossing tests.
* Remove unneeded sentencizer component.
* Fix or ignore mypy issues in test.
* Simplify ent.sents handling.
* Format. Update assert in ent.sents handling.
* Small rewrite
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-03 15:28:52 +02:00
Adriane Boyd
be644caa13
Fix --verbose for spacy find-threshold ( #12418 )
2023-04-03 15:28:52 +02:00
Adriane Boyd
7880da952b
CI: Add all paths before excluding patterns ( #12419 )
2023-04-03 15:28:52 +02:00
Raphael Mitsch
545218a7d9
Fix sentence indexing bug in Span.sents
( #12405 )
...
* Add test for partial sentences in ent.sents.
* Removed unneeded import.
* Format. Simplify code.
2023-04-03 15:28:52 +02:00
Adriane Boyd
d00e58d1ac
CI: Move CLI tests to ubuntu for speed ( #12409 )
2023-04-03 15:28:52 +02:00
Adriane Boyd
9ca67dc539
Fix thinc-apple-ops test to run for python 3.11 ( #12408 )
2023-04-03 15:28:52 +02:00
Adriane Boyd
ed83cafe46
CI: Move universe validation to validate job ( #12406 )
...
* CI: Move universe validation to validate job
* Fix indentation
* Update step name
2023-04-03 15:28:52 +02:00