Commit Graph

15115 Commits

Author SHA1 Message Date
Adriane Boyd
e2190f7914 Clean up warnings in the test suite (#11331) 2022-12-12 16:05:27 +01:00
Adriane Boyd
bf567cb8c9 Rename test helper method with non-test_ name (#11701) 2022-12-12 13:02:46 +01:00
Adriane Boyd
784fa07694 Cast to uint64 for all array-based doc representations (#11933)
* Convert all individual values explicitly to uint64 for array-based doc representations

* Temporarily test with latest numpy v1.24.0rc

* Remove unnecessary conversion from attr_t

* Reduce number of individual casts

* Convert specifically from int32 to uint64

* Revert "Temporarily test with latest numpy v1.24.0rc"

This reverts commit eb0e3c5006.

* Also use int32 in tests
2022-12-12 13:02:46 +01:00
Paul O'Leary McCann
3b1f552b72 Config generation fails for GPU without transformers (#11899)
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.

There may be a better way to do this, but this stops the error.
2022-12-12 12:50:43 +01:00
Paul O'Leary McCann
eea195c42e Add in errors used in the beam code that were removed at some point (#11935)
I don't think there's any way to use the beam code at the moment, but as
long as it's around the errors it refers to should also be present.
2022-12-12 12:50:30 +01:00
Adriane Boyd
524f32be64 Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-12-12 12:49:50 +01:00
Adriane Boyd
9d23ebc891 Fix spancat for zero suggestions (#11860)
* Add test for spancat predict with zero suggestions

* Fix spancat for zero suggestions

* Undo changes to extract_spans

* Use .sum() as in update
2022-12-12 12:47:32 +01:00
Adriane Boyd
d25391df5e Revert "Add click pin to avoid typer issues (#10573)"
This reverts commit 9966e08f32.
2022-12-12 12:46:51 +01:00
Adriane Boyd
e97b07f19d Support env var for num build jobs (#11073) 2022-07-04 20:51:07 +02:00
Adriane Boyd
d9ad5392c5 Extend build constraints for aarch64 2022-07-04 13:32:04 +02:00
Adriane Boyd
e147a52398
Merge pull request #10581 from adrianeboyd/chore/v3.1.6
Typer workaround, set version to v3.1.6
2022-03-30 09:52:15 +02:00
Adriane Boyd
be1a1d7f28 Set version to v3.1.6 2022-03-30 08:38:51 +02:00
Adriane Boyd
53aa88a929 Add click pin to avoid typer issues (#10573) 2022-03-30 08:38:41 +02:00
Adriane Boyd
1355396051
Set version to v3.1.5 (#10388) 2022-02-28 12:54:14 +01:00
Adriane Boyd
c51c4534d8
Merge pull request #10356 from adrianeboyd/chore/backports-v3.1.5
Backports for v3.1.5
2022-02-28 08:59:13 +01:00
Adriane Boyd
2dc383ae1c Fix spancat for empty docs and zero suggestions (#9654)
* Fix spancat for empty docs and zero suggestions

* Use ops.xp.zeros in test
2022-02-22 18:11:43 +01:00
Adriane Boyd
c69a8756b6
Merge pull request #10345 from adrianeboyd/chore/v3.1-backport-10324
Fix Tok2Vec for empty batches (#10324)
2022-02-21 16:42:09 +01:00
Sofie Van Landeghem
5d0cc79940 fix type of lexeme.rank (#9979) 2022-02-21 15:21:46 +01:00
Adriane Boyd
900741401e Switch to latest CI images (#9773) 2022-02-21 15:00:37 +01:00
Daniël de Kok
fa8f03047d Pin mypy to 0.910 until there is a compatible pydantic version 2022-02-21 14:59:35 +01:00
Adriane Boyd
7c43f8a52d Fix Tok2Vec for empty batches (#10324)
* Add test for tok2vec with vectors and empty docs

* Add shortcut for empty batch in Tok2Vec.predict

* Avoid types
2022-02-21 14:30:35 +01:00
Bram Vanroy
cab9209c3d
use metaclass to decorate errors (#9593) 2021-11-03 15:29:32 +01:00
Paul O'Leary McCann
c1cc94a33a
Fix typo about receptive field size (#9564) 2021-11-03 15:16:55 +01:00
Adriane Boyd
e06bbf72a4
Fix tok2vec-less textcat generation in website quickstart (#9610) 2021-11-03 15:11:07 +01:00
Paul O'Leary McCann
e43639b27a
Add note about round-trip serializing pipeline to API docs (#9583) 2021-11-03 09:55:30 +01:00
Lj Miranda
f1bc655a38
Add initial Tagalog (tl) tests (#9582)
* Add tl_tokenizer to test fixtures

* Add tagalog tests
2021-11-02 08:35:49 +01:00
xxyzz
90ec820f05
Add WordDumb to spaCy Universe (#9572)
* Add WordDumb to spaCy Universe

* Add standalone category

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:41 +09:00
Bruce W. Lee (이웅성)
a4dcb68cf6
Adding LingFeat Software to spaCy Universe. (#9574)
* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:14 +09:00
Vasundhara
5279c7c4ba
Fix broken link to mappings-exceptions (#9573) 2021-10-31 13:44:29 +09:00
Paul O'Leary McCann
006df1ae1f
Clarify error when words are of wrong type (#9541)
* Clarify error when words are of wrong type

See #9437

* Update docs

* Use try/except

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-29 12:08:40 +02:00
Paul O'Leary McCann
2fd8d616e7
Add docs section for spacy.cli.train.train (#9545)
* Add section for spacy.cli.train.train

* Add link from training page to train function

* Ensure path in train helper

* Update docs

Co-authored-by: Ines Montani <ines@ines.io>
2021-10-29 10:36:34 +02:00
Adriane Boyd
5477453ea3
Docs for thinc-apple-ops (#9549)
* Docs for thinc-apple-ops

* Ignore thinc-apple-ops in reqs tests

* Fix install quickstart

* Add cupy cuda 113, 114 extras

* Remove draft section

Co-authored-by: Ines Montani <ines@ines.io>
2021-10-29 10:35:31 +02:00
Philip Vollet
76173b0866
fixed typo and URL (#9560) 2021-10-29 13:57:44 +09:00
Adriane Boyd
72dc63b3fb
Update for python 3.10 (#9519)
* Update for python 3.10

* Update mac image

* Update build constraints for python 3.10

* Add extras for cupy cuda 11.3-11.5

* Remove cupy-cuda115 extra

* Require thinc>=8.0.12

* Switch CI to windows-2019

* Skip mypy for python 3.10
2021-10-28 15:32:06 +02:00
Adriane Boyd
386dcada1c
Address random results in slow readers tests (#9544)
* Set random seed for dataset shuffling
* Use more dev examples for non-zero scores
2021-10-26 16:53:10 +02:00
Elia Robyn Lake (Robyn Speer)
fa70837f28
clarify how to connect pretraining to training (#9450)
* clarify how to connect pretraining to training

Signed-off-by: Elia Robyn Speer <elia@explosion.ai>

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

* Update website/docs/usage/embeddings-transformers.md

Co-authored-by: Elia Robyn Speer <elia@explosion.ai>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-10-22 13:15:47 +02:00
github-actions[bot]
b0b115ff39
Auto-format code with black (#9530)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-10-22 13:03:10 +02:00
Sofie Van Landeghem
c9f28b6d08
Merge branch 'spacy.io' into master 2021-10-21 20:46:33 +02:00
Sofie Van Landeghem
c7ed631f3c
bump version to 3.1.4 (#9524) 2021-10-21 20:34:57 +02:00
Daniël de Kok
f31ac6fd4f
Print a warning when multiprocessing is used on a GPU (#9475)
* Raise an error when multiprocessing is used on a GPU

As reported in #5507, a confusing exception is thrown when
multiprocessing is used with a GPU model and the `fork` multiprocessing
start method:

cupy.cuda.runtime.CUDARuntimeError: cudaErrorInitializationError: initialization error

This change checks whether one of the models uses the GPU when
multiprocessing is used. If so, raise a friendly error message.

Even though multiprocessing can work on a GPU with the `spawn` method,
it quickly runs the GPU out-of-memory on real-world data. Also,
multiprocessing on a single GPU typically does not provide large
performance gains.

* Move GPU multiprocessing check to Language.pipe

* Warn rather than error when using multiprocessing with GPU models

* Improve GPU multiprocessing warning message.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Reduce API assumptions

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/language.py

* Update spacy/language.py

* Test that warning is thrown with GPU + multiprocessing

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-10-21 16:14:23 +02:00
Sofie Van Landeghem
5a38f79f18
Custom component types in spacy.ty (#9469)
* add custom protocols in spacy.ty

* add a test for the new types in spacy.ty

* import Example when type checking

* some type fixes

* put Protocol in compat

* revert update check back to hasattr

* runtime_checkable in compat as well
2021-10-21 15:31:06 +02:00
Daniël de Kok
d0631e3005
Replace use_ops("numpy") by use_ops("cpu") in the parser (#9501)
* Replace use_ops("numpy") by use_ops("cpu") in the parser

This ensures that the best available CPU implementation is chosen
(e.g. Thinc Apple Ops on macOS).

* Run spaCy tests with apple-thinc-ops on macOS
2021-10-21 11:22:45 +02:00
Paul O'Leary McCann
28ecf399da
Remove some old version refs in the docs (#9448)
* Remove some old version refs in the docs

* Remove warning

* Update spacy/matcher/matcher.pyx

* Remove all references to the punctuation warning

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-10-21 11:17:59 +02:00
Duygu Altinok
1ee4d6ef49 Corrected broken (#9505) 2021-10-20 18:07:28 +02:00
Philip Vollet
a31a4bb7bd Add projects to spaCy Universe (#9269)
* Added spaCy Universe projects

* Added user license agreement Philip Vollet

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update website/meta/universe.json

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-10-20 18:07:07 +02:00
Duygu Altinok
7b98aa4c16
Corrected broken (#9505) 2021-10-20 17:31:59 +02:00
Edward
014da12f1d
Dont add tok2vec when efficiency textcat (#9502) 2021-10-20 17:30:19 +02:00
Ryn Daniels
ddc1bf5b8b
Merge pull request #9518 from explosion/rfd-robot-slowtests
Enable the test_slow command for explosionbot
2021-10-20 12:44:20 +02:00
Daniël de Kok
1f05f56433
Add the spacy.models_with_nvtx_range.v1 callback (#9124)
* Add the spacy.models_with_nvtx_range.v1 callback

This callback recursively adds NVTX ranges to the Models in each pipe in
a pipeline.

* Fix create_models_with_nvtx_range type signature

* NVTX range: wrap models of all trainable pipes jointly

This avoids that (sub-)models that are shared between pipes get wrapped
twice.

* NVTX range callback: make color configurable

Add forward_color and backprop_color options to set the color for the
NVTX range.

* Move create_models_with_nvtx_range to spacy.ml

* Update create_models_with_nvtx_range for thinc changes

with_nvtx_range now updates an existing node, rather than returning a
wrapper node. So, we can simply walk over the nodes and update them.

* NVTX: use after_pipeline_creation in example
2021-10-20 11:59:48 +02:00
Ryn Daniels
66b474ce05
Merge branch 'master' into rfd-robot-slowtests 2021-10-20 11:56:01 +02:00