Commit Graph

15368 Commits

Author SHA1 Message Date
Kádár Ákos
7a239f2ec7 report start- and end-accuracies separately 2022-04-08 14:57:19 +02:00
Kádár Ákos
2a1ad4c5d2 add backprop callback to spanpredictor 2022-04-08 14:56:44 +02:00
Kádár Ákos
3ba913109d update with eg.predited as other components 2022-04-07 13:20:12 +02:00
Kádár Ákos
ef141ad399 span accuracy score 2022-04-04 18:10:09 +02:00
Kádár Ákos
a1d0219903 prepare for aligned heads-spans training 2022-04-04 15:26:15 +02:00
Kádár Ákos
63a41ba50a fix score overwriting bug 2022-03-30 17:28:20 +02:00
Kádár Ákos
7ff99a3acc nicer restore 2022-03-28 18:16:41 +02:00
Kádár Ákos
06d680b269 addressing suggestions by @polm 2022-03-28 14:31:51 +02:00
Kádár Ákos
e4b4b67ef6 handle empty clusters 2022-03-28 11:29:00 +02:00
Kádár Ákos
4fc40340f9 handle empty head_ids 2022-03-28 11:28:21 +02:00
Kádár Ákos
7304604edd make sure predicted and reference keeps aligned 2022-03-25 18:29:33 +01:00
Kádár Ákos
83ac0477c8 remove useless extra prefix and device from spanpredictor 2022-03-24 16:44:50 +01:00
Kádár Ákos
1c5dabcb47 merge SpanPredictor attributes 2022-03-24 16:23:12 +01:00
Kádár Ákos
a872c69ffb merge 2022-03-24 16:10:04 +01:00
Kádár Ákos
706b2e6f25 gearing up SpanPredictor for gold-heads 2022-03-24 16:06:20 +01:00
Kádár Ákos
150e7c46d7 conflict 2022-03-23 11:27:02 +01:00
Kádár Ákos
1eaf8fb0cf span predictor debug start 2022-03-23 11:24:27 +01:00
Paul O'Leary McCann
eec00ce60d Fix various sizes in SpanPredictor FFNN 2022-03-23 16:20:31 +09:00
Paul O'Leary McCann
2190cbc0e6 Add progress on SpanPredictor component
This isn't working. There is a CUDA error in the torch code during
initialization and it's not clear why.
2022-03-19 19:39:49 +09:00
Kádár Ákos
db422abf01 remove unnecessary .device 2022-03-18 16:24:26 +01:00
Paul O'Leary McCann
a098849112 Add fake batching
The way fake batching works is that the pipeline component calls the
model repeatedly in a loop internally. It feels like this should break
something, but it worked in testing.

Another issue is that this changes the signature of some of the pipeline
functions, though I don't think that's an issue.

Tested with batch size of 2, so more testing is needed, but this is a
start.
2022-03-18 19:46:58 +09:00
Paul O'Leary McCann
1a79d18796 Formatting 2022-03-16 20:10:47 +09:00
Paul O'Leary McCann
6855df0e66 Skeleton for span predictor component
This should be moved into its own file, but for now just stubbing out
the methods.
2022-03-16 20:09:33 +09:00
Paul O'Leary McCann
0275ae29de Remove stale comment 2022-03-16 20:09:12 +09:00
Paul O'Leary McCann
6974f55daa Hack for transformer listener size 2022-03-16 15:15:53 +09:00
Paul O'Leary McCann
7811a1194b Change architecture 2022-03-16 14:57:15 +09:00
Paul O'Leary McCann
5650853c0f Remove unused functions 2022-03-16 14:38:11 +09:00
Paul O'Leary McCann
d0ae2590db Delete all the coref-hoi code 2022-03-15 20:05:24 +09:00
Paul O'Leary McCann
abdc7d87af Clean up util code
Moved everything into coref_util.py, deleted wl-specific file.
2022-03-15 19:59:44 +09:00
Paul O'Leary McCann
55039a66ad Remove old default config 2022-03-15 19:53:09 +09:00
Paul O'Leary McCann
17d017a177 Remove span2head
This doesn't work as a component because it needs to modify gold data,
so instead it's a conversion script (in another repo).
2022-03-15 19:52:20 +09:00
Paul O'Leary McCann
0522a43116 Make span2head component 2022-03-15 19:19:15 +09:00
Paul O'Leary McCann
e6917d8dc4 Add util functions for wl-coref 2022-03-14 19:27:55 +09:00
Paul O'Leary McCann
dfec6993d6 Training works now 2022-03-14 19:27:23 +09:00
Paul O'Leary McCann
8eadf3781b Training runs now
Evaluation needs fixing, and code still needs cleanup.
2022-03-14 19:02:17 +09:00
Paul O'Leary McCann
d22a002641 Forward/backward pass works
Evaluate does not work - predict hasn't been updated
2022-03-14 17:26:27 +09:00
Paul O'Leary McCann
c4f9c24738 The coref model is able to be loaded
The span predictor component is initialized but not used at all now.
Plan is to work on it after the word level clustering part is trainable
end-to-end.
2022-03-09 19:31:11 +09:00
Paul O'Leary McCann
35cc2b138f Add span predictor code
Accidentally omitted before
2022-03-08 18:13:26 +09:00
Paul O'Leary McCann
1c697b4011 Remove references to config
Replaced with model arguments
2022-03-08 18:13:09 +09:00
Paul O'Leary McCann
c0cd5025e3 Start bringin in wl-coref
This absolutely does not work. First step here is getting over most of
the code in roughly the files we want it in. After the code has been
pulled over it can be restructured to match spaCy and cleaned up.
2022-03-06 20:00:15 +09:00
svlandeg
0c15ab7ca1 remove irrelevant unit test (behaviour clarified by new error msgs around doc.spans) 2022-02-07 12:17:18 +01:00
Paul O'Leary McCann
c7f586c4ba Merge branch 'master' into feature/coref
This brings coref up to date, in particular giving access to 3.2
features.
2022-02-03 19:01:18 +09:00
Lj Miranda
345e7f6bc4
Clarify Span.ents documentation (#10154)
* Clarify Span.ents documentation

Ref: #10135

Retain current behaviour. Span.ents will only include entities within
said span. You can't get tokens outside of the original span.

* Reword docstrings

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update API docs in the website

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-01-31 08:41:42 +01:00
Marek Šuppa
f09c799a96
fix: Add missing comma to _eleven_to_beyond (#10166)
* This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.
2022-01-30 16:45:06 +09:00
Marek Šuppa
67ecac633f
fix: Add missing comma to examples.py (#10167)
* This comma has been most probably been left out unintentionally, leading to string concatenation between the two consecutive lines. This issue has been found automatically using a regular expression.
2022-01-30 16:43:29 +09:00
Adriane Boyd
4f441dfa24
Fix infix as prefix in Tokenizer.explain (#10140)
* Fix infix as prefix in Tokenizer.explain

Update `Tokenizer.explain` to align with the `Tokenizer` algorithm:

* skip infix matches that are prefixes in the current substring

* Update tokenizer pseudocode in docs
2022-01-28 17:00:54 +01:00
Eduard Zorita
30cf9d6a05
Update typing hints (#10109)
* Improve typing hints for Matcher.__call__

* Add typing hints for DependencyMatcher

* Add typing hints to underscore extensions

* Update Doc.tensor type (requires numpy 1.21)

* Fix typing hints for Language.component decorator

* Use generic np.ndarray type in Doc to avoid numpy version update

* Fix mypy errors

* Fix cyclic import caused by Underscore typing hints

* Use Literal type from spacy.compat

* Update matcher.pyi import format

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-01-28 16:59:54 +01:00
Adriane Boyd
09734c56fc
Use simple suggester for spancat initialization (#10143)
Instead of the running the actual suggester, which may require
annotation from annotating components that is not necessarily present in
the reference docs, use the built-in 1-gram suggester.
2022-01-28 09:34:23 +01:00
github-actions[bot]
6d4db5c3c7
Auto-format code with black (#10106)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-01-21 10:01:10 +01:00
Ines Montani
34ed93ef68
Support version tags in universe and add note about reporting (#10093)
* Support version tags in universe and add note about reporting

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-01-20 23:21:26 +01:00