Commit Graph

258 Commits

Author SHA1 Message Date
Paul O'Leary McCann
2e9dadfda4 Remove orphaned function
This was probably used in the prototyping stage, left as a reference,
and then forgotten. Nothing uses it any more.
2022-07-12 16:06:15 +09:00
Paul O'Leary McCann
1baa334b8a Make get_clusters_from_doc return spans in order
There's no guarantee about the order in which SpanGroup keys will come
out, so access them in sorted order when doing comparisons.
2022-07-12 14:07:40 +09:00
Paul O'Leary McCann
64a0bf4460
Merge branch 'feature/coref' into coref/dimension-inference 2022-07-12 12:56:10 +09:00
Paul O'Leary McCann
baeb35f31b Add type annotations for internal models 2022-07-11 20:03:29 +09:00
Paul O'Leary McCann
4d032396b8 Merge branch 'feature/coref' into coref/dimension-inference 2022-07-11 19:18:46 +09:00
Paul O'Leary McCann
6d9eafeb37
Merge branch 'feature/coref' into fix/coref-alignment 2022-07-11 19:14:37 +09:00
Paul O'Leary McCann
1b3db149df Merge branch 'fix/coref-alignment' into feature/coref 2022-07-11 19:12:03 +09:00
Paul O'Leary McCann
2eee0d248e Fix types
mypy now exits without an error, except for two apparently unrelated
ones about setup.py.
2022-07-08 18:29:14 +09:00
Paul O'Leary McCann
da81a90d64 Span predictor leftovers 2022-07-06 19:29:27 +09:00
Paul O'Leary McCann
b0800ea855 Do dimension inference in span predictor 2022-07-06 19:22:37 +09:00
Paul O'Leary McCann
b59b924e49 Use normal PyTorchWrapper in coref 2022-07-06 19:22:19 +09:00
Paul O'Leary McCann
f67c1735c5 Remove tok2vec_size from coref 2022-07-06 18:58:57 +09:00
Paul O'Leary McCann
bd17c38b74 It works!
Was missing the serialization-related code from biaffine.
2022-07-06 18:58:22 +09:00
Paul O'Leary McCann
ba1bf8ae72 First take at dimension inference
This follows the pattern used in the Biaffine Parser, which uses an init
function to get the size only after the tok2vec is available.

This works at first, but serialization fails with an error.
2022-07-06 18:40:05 +09:00
Paul O'Leary McCann
c4de3e51a2 Remove old TODOs 2022-07-06 17:23:41 +09:00
Paul O'Leary McCann
6f5cf838ec Remove _spans_to_offsets
Basically the same as get_clusters_from_doc
2022-07-06 14:05:05 +09:00
Paul O'Leary McCann
8f598d7b01 Feedback from code review 2022-07-06 14:03:09 +09:00
Paul O'Leary McCann
63e27b5e44
Update spacy/ml/models/coref_util.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-07-06 13:46:02 +09:00
Paul O'Leary McCann
c7f333d593 Rename spans2ints > _spans_to_offsets 2022-07-04 19:28:35 +09:00
Paul O'Leary McCann
cf33b48fe0 Update tests 2022-07-03 20:10:53 +09:00
Paul O'Leary McCann
619b1102e6 Use config to specify tok2vec_size 2022-07-03 15:32:35 +09:00
Paul O'Leary McCann
201731df2d Move spans2ints to util 2022-07-03 15:12:53 +09:00
Paul O'Leary McCann
79720886fa Merge branch 'feature/coref' into fix/coref-alignment
Had to renumber error message.
2022-07-01 19:09:29 +09:00
kadarakos
0076f0f617 span predictor device fix 2022-06-29 06:58:47 +00:00
kadarakos
1a782592c4 make sure same device 2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a Merge branch 'master' into feature/coref 2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
d1ff933e9b Test works
This may not be done yet, as the test is just for consistency, and not
overfitting correctly yet.
2022-06-28 19:15:33 +09:00
Paul O'Leary McCann
ef5762d78e Bad hack to get tests to run
This changes the tok2vec size in coref to hardcoded 64 to get tests to
run. This should be reverted and hopefully replaced with proper shape
inference.
2022-06-28 19:06:13 +09:00
Paul O'Leary McCann
16894e665d
Refactor Coval Scoring code (#10875)
* Move coref scoring code to scorer.py

Includes some renames to make names less generic.

* Refactor coval code to remove ternary expressions

* Black formatting

* Add header

* Make scorers into registered scorers

* Small test fixes

* Skip coref tests when torch not present

Coref can't be loaded without Torch, so nothing works.

* Fix remaining type issues

Some of this just involves ignoring types in thorny areas. Two main
issues:

1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point

* Update spacy/scorer.py

Co-authored-by: kadarakos <kadar.akos@gmail.com>

* Small changes from review

* Be specific about the ValueError

* Type fix

Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-06-22 16:05:52 +09:00
Paul O'Leary McCann
196886bbca
Fix coref size inference (#10916)
* Add explicit tok2vec_size parameter in clusterer

* Add tok2vec size to span predictor config

* Minor fixes
2022-06-08 20:03:41 +09:00
github-actions[bot]
24aafdffad
Auto-format code with black (#10908)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-03 11:01:55 +02:00
Paul O'Leary McCann
dca2e8c644
Minor NEL type fixes (#10860)
* Fix TODO about typing

Fix was simple: just request an array2f.

* Add type ignore

Maxout has a more restrictive type than the residual layer expects (only
Floats2d vs any Floats).

* Various cleanup

This moves a lot of lines around but doesn't change any functionality.
Details:

1. use `continue` to reduce indentation
2. move sentence doc building inside conditional since it's otherwise
   unused
3. reduces some temporary assignments
2022-06-01 00:41:28 +02:00
svlandeg
cea40c9d7b fix types + black formatting 2022-05-25 13:34:09 +02:00
Adriane Boyd
f75a528787
Update spacy/ml/models/spancat.py 2022-05-25 13:05:41 +02:00
svlandeg
015050f42c Merge branch 'master' into feature/coref 2022-05-25 13:01:56 +02:00
Paul O'Leary McCann
838f50192b Black formatting 2022-05-25 19:20:03 +09:00
Paul O'Leary McCann
2a8efda689 Code review suggestions, cleanup 2022-05-25 19:18:26 +09:00
Paul O'Leary McCann
e721c7bed8 Import cleanup 2022-05-25 19:12:20 +09:00
Richard Hudson
32954c3bcb
Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786)
* Make changes to typing

* Correction

* Format with black

* Corrections based on review

* Bumped Thinc dependency version

* Bumped blis requirement

* Correction for older Python versions

* Update spacy/ml/models/textcat.py

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>

* Corrections based on review feedback

* Readd deleted docstring line

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
2022-05-25 09:33:54 +02:00
Paul O'Leary McCann
c9233a5a1f Import torch from thinc 2022-05-24 17:28:27 +09:00
Paul O'Leary McCann
5cbc9f4573 Use thinc.util.has_torch 2022-05-24 16:02:39 +09:00
Paul O'Leary McCann
b1118cee58 Move epsilon 2022-05-24 15:59:08 +09:00
Paul O'Leary McCann
9da16df96e Add guards around torch import
Torch is required for the coref/spanpred models but shouldn't be
required for spaCy in general.

The one tricky part of this is that one function in coref_util relied on
torch, but that file was imported in several places. Since the function
was only used in one place I moved it there.
2022-05-24 15:16:25 +09:00
kadarakos
1dc3894447 new parameters 2022-05-17 15:36:32 +00:00
kadarakos
403fb95d56 merge 2022-05-17 06:56:34 +00:00
Paul O'Leary McCann
2e8f0e9168 Rename coref params 2022-05-16 16:50:10 +09:00
Paul O'Leary McCann
13481fbcc2 Remove unused param, add TODOs about typing 2022-05-13 19:29:28 +09:00
kadarakos
b7ac4b33e2 fixing arguments 2022-05-11 14:59:59 +00:00
kadarakos
7cf6bcca0e merge misery 2022-05-10 17:19:16 +00:00
kadarakos
e512874c80 small refactor and docs 2022-05-10 16:40:31 +00:00