spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-02 18:06:46 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	2e9dadfda4	Remove orphaned function This was probably used in the prototyping stage, left as a reference, and then forgotten. Nothing uses it any more.	2022-07-12 16:06:15 +09:00
Paul O'Leary McCann	1baa334b8a	Make get_clusters_from_doc return spans in order There's no guarantee about the order in which SpanGroup keys will come out, so access them in sorted order when doing comparisons.	2022-07-12 14:07:40 +09:00
Paul O'Leary McCann	64a0bf4460	Merge branch 'feature/coref' into coref/dimension-inference	2022-07-12 12:56:10 +09:00
Paul O'Leary McCann	7792229fa9	Merge branch 'master' into feature/coref	2022-07-11 20:16:23 +09:00
Paul O'Leary McCann	5969634e92	Merge branch 'master' into coref/dimension-inference	2022-07-11 20:11:51 +09:00
Paul O'Leary McCann	baeb35f31b	Add type annotations for internal models	2022-07-11 20:03:29 +09:00
Paul O'Leary McCann	4d032396b8	Merge branch 'feature/coref' into coref/dimension-inference	2022-07-11 19:18:46 +09:00
Paul O'Leary McCann	6d9eafeb37	Merge branch 'feature/coref' into fix/coref-alignment	2022-07-11 19:14:37 +09:00
Paul O'Leary McCann	1b3db149df	Merge branch 'fix/coref-alignment' into feature/coref	2022-07-11 19:12:03 +09:00
Paul O'Leary McCann	2eee0d248e	Fix types mypy now exits without an error, except for two apparently unrelated ones about setup.py.	2022-07-08 18:29:14 +09:00
Paul O'Leary McCann	da81a90d64	Span predictor leftovers	2022-07-06 19:29:27 +09:00
Paul O'Leary McCann	b0800ea855	Do dimension inference in span predictor	2022-07-06 19:22:37 +09:00
Paul O'Leary McCann	b59b924e49	Use normal PyTorchWrapper in coref	2022-07-06 19:22:19 +09:00
Paul O'Leary McCann	f67c1735c5	Remove tok2vec_size from coref	2022-07-06 18:58:57 +09:00
Paul O'Leary McCann	bd17c38b74	It works! Was missing the serialization-related code from biaffine.	2022-07-06 18:58:22 +09:00
Paul O'Leary McCann	ba1bf8ae72	First take at dimension inference This follows the pattern used in the Biaffine Parser, which uses an init function to get the size only after the tok2vec is available. This works at first, but serialization fails with an error.	2022-07-06 18:40:05 +09:00
Paul O'Leary McCann	c4de3e51a2	Remove old TODOs	2022-07-06 17:23:41 +09:00
Paul O'Leary McCann	6f5cf838ec	Remove _spans_to_offsets Basically the same as get_clusters_from_doc	2022-07-06 14:05:05 +09:00
Paul O'Leary McCann	8f598d7b01	Feedback from code review	2022-07-06 14:03:09 +09:00
Paul O'Leary McCann	63e27b5e44	Update spacy/ml/models/coref_util.py Co-authored-by: kadarakos <kadar.akos@gmail.com>	2022-07-06 13:46:02 +09:00
Daniël de Kok	a06cbae70d	precompute_hiddens/Parser: do not look up CPU ops (3.4) (#11069 ) * precompute_hiddens/Parser: do not look up CPU ops `get_ops("cpu")` is quite expensive. To avoid this, we want to cache the result as in #11068. However, for 3.x we do not want to change the ABI. So we avoid the expensive lookup by using NumpyOps. This should have a minimal impact, since `get_ops("cpu")` was only used when the model ops were `CupyOps`. If the ops are `AppleOps`, we are still passing through the correct BLAS implementation. * _NUMPY_OPS -> NUMPY_OPS	2022-07-05 10:53:42 +02:00
Paul O'Leary McCann	c7f333d593	Rename spans2ints > _spans_to_offsets	2022-07-04 19:28:35 +09:00
Paul O'Leary McCann	cf33b48fe0	Update tests	2022-07-03 20:10:53 +09:00
Paul O'Leary McCann	619b1102e6	Use config to specify tok2vec_size	2022-07-03 15:32:35 +09:00
Paul O'Leary McCann	201731df2d	Move spans2ints to util	2022-07-03 15:12:53 +09:00
Paul O'Leary McCann	79720886fa	Merge branch 'feature/coref' into fix/coref-alignment Had to renumber error message.	2022-07-01 19:09:29 +09:00
Madeesh Kannan	eaf66e7431	Add NVTX ranges to `TrainablePipe` components (#10965 ) * `TrainablePipe`: Add NVTX range decorator * Annotate `TrainablePipe` subclasses with NVTX ranges * Export function signature to allow introspection of args in tests * Revert "Annotate `TrainablePipe` subclasses with NVTX ranges" This reverts commit `d8684f7372`. * Revert "Export function signature to allow introspection of args in tests" This reverts commit `f4405ca3ad`. * Revert "`TrainablePipe`: Add NVTX range decorator" This reverts commit `26536eb6b8`. * Add `spacy.pipes_with_nvtx_range` pipeline callback * Show warnings for all missing user-defined pipe functions that need to be annotated Fix imports, typos * Rename `DEFAULT_ANNOTATABLE_PIPE_METHODS` to `DEFAULT_NVTX_ANNOTATABLE_PIPE_METHODS` Reorder import * Walk model nodes directly whilst applying NVTX ranges Ignore pipe method wrapper when applying range	2022-06-30 11:28:12 +02:00
kadarakos	0076f0f617	span predictor device fix	2022-06-29 06:58:47 +00:00
kadarakos	1a782592c4	make sure same device	2022-06-28 12:53:20 +00:00
kadarakos	9f9453865a	Merge branch 'master' into feature/coref	2022-06-28 10:27:35 +00:00
Paul O'Leary McCann	d1ff933e9b	Test works This may not be done yet, as the test is just for consistency, and not overfitting correctly yet.	2022-06-28 19:15:33 +09:00
Paul O'Leary McCann	ef5762d78e	Bad hack to get tests to run This changes the tok2vec size in coref to hardcoded 64 to get tests to run. This should be reverted and hopefully replaced with proper shape inference.	2022-06-28 19:06:13 +09:00
Paul O'Leary McCann	16894e665d	Refactor Coval Scoring code (#10875 ) * Move coref scoring code to scorer.py Includes some renames to make names less generic. * Refactor coval code to remove ternary expressions * Black formatting * Add header * Make scorers into registered scorers * Small test fixes * Skip coref tests when torch not present Coref can't be loaded without Torch, so nothing works. * Fix remaining type issues Some of this just involves ignoring types in thorny areas. Two main issues: 1. Some things have weird types due to indirection/ argskwargs 2. xp2torch return type seems to have changed at some point * Update spacy/scorer.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * Small changes from review * Be specific about the ValueError * Type fix Co-authored-by: kadarakos <kadar.akos@gmail.com>	2022-06-22 16:05:52 +09:00
github-actions[bot]	6313787fb6	Auto-format code with black (#10977 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-06-17 19:41:55 +01:00
Daniël de Kok	3d3fbeda9f	Update for CBlas changes in Thinc 8.1.0.dev2 (#10970 )	2022-06-16 11:42:34 +02:00
Daniël de Kok	a83a501195	precomputable_biaffine: avoid concatenation (#10911 ) The `forward` of `precomputable_biaffine` performs matrix multiplication and then `vstack`s the result with padding. This creates a temporary array used for the output of matrix concatenation. This change avoids the temporary by pre-allocating an array that is large enough for the output of matrix multiplication plus padding and fills the array in-place. This gave me a small speedup (a bit over 100 WPS) on de_core_news_lg on M1 Max (after changing thinc-apple-ops to support in-place gemm as BLIS does).	2022-06-10 18:12:28 +02:00
Paul O'Leary McCann	196886bbca	Fix coref size inference (#10916 ) * Add explicit tok2vec_size parameter in clusterer * Add tok2vec size to span predictor config * Minor fixes	2022-06-08 20:03:41 +09:00
github-actions[bot]	24aafdffad	Auto-format code with black (#10908 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-06-03 11:01:55 +02:00
Paul O'Leary McCann	dca2e8c644	Minor NEL type fixes (#10860 ) * Fix TODO about typing Fix was simple: just request an array2f. * Add type ignore Maxout has a more restrictive type than the residual layer expects (only Floats2d vs any Floats). * Various cleanup This moves a lot of lines around but doesn't change any functionality. Details: 1. use `continue` to reduce indentation 2. move sentence doc building inside conditional since it's otherwise unused 3. reduces some temporary assignments	2022-06-01 00:41:28 +02:00
Daniël de Kok	85dd2b6c04	Parser: use C saxpy/sgemm provided by the Ops implementation (#10773 ) * Parser: use C saxpy/sgemm provided by the Ops implementation This is a backport of https://github.com/explosion/spaCy/pull/10747 from the parser refactor branch. It eliminates the explicit calls to BLIS, instead using the saxpy/sgemm provided by the Ops implementation. This allows us to use Accelerate in the parser on M1 Macs (with an updated thinc-apple-ops). Performance of the de_core_news_lg pipe: BLIS 0.7.0, no thinc-apple-ops: 6385 WPS BLIS 0.7.0, thinc-apple-ops: 36455 WPS BLIS 0.9.0, no thinc-apple-ops: 19188 WPS BLIS 0.9.0, thinc-apple-ops: 36682 WPS This PR, thinc-apple-ops: 38726 WPS Performance of the de_core_news_lg pipe (only tok2vec -> parser): BLIS 0.7.0, no thinc-apple-ops: 13907 WPS BLIS 0.7.0, thinc-apple-ops: 73172 WPS BLIS 0.9.0, no thinc-apple-ops: 41576 WPS BLIS 0.9.0, thinc-apple-ops: 72569 WPS This PR, thinc-apple-ops: 87061 WPS * Require thinc >=8.1.0,<8.2.0 * Lower thinc lowerbound to 8.1.0.dev0 * Use best CPU ops for CBLAS when the parser model is on the GPU * Fix another unguarded cblas() call * Fix: use ops as a shorthand for self.model.ops Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2022-05-27 11:20:52 +02:00
svlandeg	cea40c9d7b	fix types + black formatting	2022-05-25 13:34:09 +02:00
Adriane Boyd	f75a528787	Update spacy/ml/models/spancat.py	2022-05-25 13:05:41 +02:00
svlandeg	015050f42c	Merge branch 'master' into feature/coref	2022-05-25 13:01:56 +02:00
Paul O'Leary McCann	838f50192b	Black formatting	2022-05-25 19:20:03 +09:00
Paul O'Leary McCann	2a8efda689	Code review suggestions, cleanup	2022-05-25 19:18:26 +09:00
Paul O'Leary McCann	e721c7bed8	Import cleanup	2022-05-25 19:12:20 +09:00
Richard Hudson	32954c3bcb	Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786 ) * Make changes to typing * Correction * Format with black * Corrections based on review * Bumped Thinc dependency version * Bumped blis requirement * Correction for older Python versions * Update spacy/ml/models/textcat.py Co-authored-by: Daniël de Kok <me@github.danieldk.eu> * Corrections based on review feedback * Readd deleted docstring line Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2022-05-25 09:33:54 +02:00
Paul O'Leary McCann	c9233a5a1f	Import torch from thinc	2022-05-24 17:28:27 +09:00
Paul O'Leary McCann	5cbc9f4573	Use thinc.util.has_torch	2022-05-24 16:02:39 +09:00
Paul O'Leary McCann	b1118cee58	Move epsilon	2022-05-24 15:59:08 +09:00

1 2 3 4 5 ...

313 Commits