Commit Graph

234 Commits

Author SHA1 Message Date
Paul O'Leary McCann
c4de3e51a2 Remove old TODOs 2022-07-06 17:23:41 +09:00
kadarakos
0076f0f617 span predictor device fix 2022-06-29 06:58:47 +00:00
kadarakos
1a782592c4 make sure same device 2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a Merge branch 'master' into feature/coref 2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
16894e665d
Refactor Coval Scoring code (#10875)
* Move coref scoring code to scorer.py

Includes some renames to make names less generic.

* Refactor coval code to remove ternary expressions

* Black formatting

* Add header

* Make scorers into registered scorers

* Small test fixes

* Skip coref tests when torch not present

Coref can't be loaded without Torch, so nothing works.

* Fix remaining type issues

Some of this just involves ignoring types in thorny areas. Two main
issues:

1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point

* Update spacy/scorer.py

Co-authored-by: kadarakos <kadar.akos@gmail.com>

* Small changes from review

* Be specific about the ValueError

* Type fix

Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-06-22 16:05:52 +09:00
Paul O'Leary McCann
196886bbca
Fix coref size inference (#10916)
* Add explicit tok2vec_size parameter in clusterer

* Add tok2vec size to span predictor config

* Minor fixes
2022-06-08 20:03:41 +09:00
github-actions[bot]
24aafdffad
Auto-format code with black (#10908)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-03 11:01:55 +02:00
Paul O'Leary McCann
dca2e8c644
Minor NEL type fixes (#10860)
* Fix TODO about typing

Fix was simple: just request an array2f.

* Add type ignore

Maxout has a more restrictive type than the residual layer expects (only
Floats2d vs any Floats).

* Various cleanup

This moves a lot of lines around but doesn't change any functionality.
Details:

1. use `continue` to reduce indentation
2. move sentence doc building inside conditional since it's otherwise
   unused
3. reduces some temporary assignments
2022-06-01 00:41:28 +02:00
svlandeg
cea40c9d7b fix types + black formatting 2022-05-25 13:34:09 +02:00
Adriane Boyd
f75a528787
Update spacy/ml/models/spancat.py 2022-05-25 13:05:41 +02:00
svlandeg
015050f42c Merge branch 'master' into feature/coref 2022-05-25 13:01:56 +02:00
Paul O'Leary McCann
838f50192b Black formatting 2022-05-25 19:20:03 +09:00
Paul O'Leary McCann
2a8efda689 Code review suggestions, cleanup 2022-05-25 19:18:26 +09:00
Paul O'Leary McCann
e721c7bed8 Import cleanup 2022-05-25 19:12:20 +09:00
Richard Hudson
32954c3bcb
Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786)
* Make changes to typing

* Correction

* Format with black

* Corrections based on review

* Bumped Thinc dependency version

* Bumped blis requirement

* Correction for older Python versions

* Update spacy/ml/models/textcat.py

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>

* Corrections based on review feedback

* Readd deleted docstring line

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>
2022-05-25 09:33:54 +02:00
Paul O'Leary McCann
c9233a5a1f Import torch from thinc 2022-05-24 17:28:27 +09:00
Paul O'Leary McCann
5cbc9f4573 Use thinc.util.has_torch 2022-05-24 16:02:39 +09:00
Paul O'Leary McCann
b1118cee58 Move epsilon 2022-05-24 15:59:08 +09:00
Paul O'Leary McCann
9da16df96e Add guards around torch import
Torch is required for the coref/spanpred models but shouldn't be
required for spaCy in general.

The one tricky part of this is that one function in coref_util relied on
torch, but that file was imported in several places. Since the function
was only used in one place I moved it there.
2022-05-24 15:16:25 +09:00
kadarakos
1dc3894447 new parameters 2022-05-17 15:36:32 +00:00
kadarakos
403fb95d56 merge 2022-05-17 06:56:34 +00:00
Paul O'Leary McCann
2e8f0e9168 Rename coref params 2022-05-16 16:50:10 +09:00
Paul O'Leary McCann
13481fbcc2 Remove unused param, add TODOs about typing 2022-05-13 19:29:28 +09:00
kadarakos
b7ac4b33e2 fixing arguments 2022-05-11 14:59:59 +00:00
kadarakos
7cf6bcca0e merge misery 2022-05-10 17:19:16 +00:00
kadarakos
e512874c80 small refactor and docs 2022-05-10 16:40:31 +00:00
Paul O'Leary McCann
33f4f90ff0 Formatting 2022-05-10 19:09:52 +09:00
Paul O'Leary McCann
41fc092674 Split span predictor model into its own file 2022-05-10 19:08:21 +09:00
svlandeg
6b51258a58 clean up unused imports + black formatting 2022-05-09 13:34:50 +02:00
Paul O'Leary McCann
683f470852 Merge branch 'master' into feature/coref 2022-04-18 18:39:08 +09:00
kadarakos
b53113e3b8
Preparing span predictor for predicting from gold (#10547)
Note this is squashed because rebasing had conflicts.

* remove unnecessary .device

* span predictor debug start

* gearing up SpanPredictor for gold-heads

* merge SpanPredictor attributes

* remove useless extra prefix and device from spanpredictor

* make sure predicted and reference keeps aligned

* handle empty head_ids

* handle empty clusters

* addressing suggestions by @polm

* nicer restore

* fix score overwriting bug

* prepare for aligned heads-spans training

* span accuracy score

* update with eg.predited as other components

* add backprop callback to spanpredictor

* report start- and end-accuracies separately

* fixing scorer

Co-authored-by: Kádár Ákos <akos@onyx.uvt.nl>
2022-04-13 19:42:49 +09:00
Kádár Ákos
2a1ad4c5d2 add backprop callback to spanpredictor 2022-04-08 14:56:44 +02:00
Kádár Ákos
4fc40340f9 handle empty head_ids 2022-03-28 11:28:21 +02:00
Kádár Ákos
83ac0477c8 remove useless extra prefix and device from spanpredictor 2022-03-24 16:44:50 +01:00
Kádár Ákos
1c5dabcb47 merge SpanPredictor attributes 2022-03-24 16:23:12 +01:00
Kádár Ákos
a872c69ffb merge 2022-03-24 16:10:04 +01:00
Kádár Ákos
706b2e6f25 gearing up SpanPredictor for gold-heads 2022-03-24 16:06:20 +01:00
Kádár Ákos
150e7c46d7 conflict 2022-03-23 11:27:02 +01:00
Kádár Ákos
1eaf8fb0cf span predictor debug start 2022-03-23 11:24:27 +01:00
Paul O'Leary McCann
eec00ce60d Fix various sizes in SpanPredictor FFNN 2022-03-23 16:20:31 +09:00
Paul O'Leary McCann
2190cbc0e6 Add progress on SpanPredictor component
This isn't working. There is a CUDA error in the torch code during
initialization and it's not clear why.
2022-03-19 19:39:49 +09:00
Kádár Ákos
db422abf01 remove unnecessary .device 2022-03-18 16:24:26 +01:00
Paul O'Leary McCann
0275ae29de Remove stale comment 2022-03-16 20:09:12 +09:00
Paul O'Leary McCann
6974f55daa Hack for transformer listener size 2022-03-16 15:15:53 +09:00
Paul O'Leary McCann
5650853c0f Remove unused functions 2022-03-16 14:38:11 +09:00
Daniël de Kok
e5debc68e4
Tagger: use unnormalized probabilities for inference (#10197)
* Tagger: use unnormalized probabilities for inference

Using unnormalized softmax avoids use of the relatively expensive exp function,
which can significantly speed up non-transformer models (e.g. I got a speedup
of 27% on a German tagging + parsing pipeline).

* Add spacy.Tagger.v2 with configurable normalization

Normalization of probabilities is disabled by default to improve
performance.

* Update documentation, models, and tests to spacy.Tagger.v2

* Move Tagger.v1 to spacy-legacy

* docs/architectures: run prettier

* Unnormalized softmax is now a Softmax_v2 option

* Require thinc 8.0.14 and spacy-legacy 3.0.9
2022-03-15 14:15:31 +01:00
Paul O'Leary McCann
d0ae2590db Delete all the coref-hoi code 2022-03-15 20:05:24 +09:00
Paul O'Leary McCann
abdc7d87af Clean up util code
Moved everything into coref_util.py, deleted wl-specific file.
2022-03-15 19:59:44 +09:00
Paul O'Leary McCann
0522a43116 Make span2head component 2022-03-15 19:19:15 +09:00
Paul O'Leary McCann
e6917d8dc4 Add util functions for wl-coref 2022-03-14 19:27:55 +09:00