Commit Graph

15700 Commits

Author SHA1 Message Date
Paul O'Leary McCann
baeb35f31b Add type annotations for internal models 2022-07-11 20:03:29 +09:00
Paul O'Leary McCann
4d032396b8 Merge branch 'feature/coref' into coref/dimension-inference 2022-07-11 19:18:46 +09:00
Paul O'Leary McCann
9cbb9702c0
Merge pull request #11042 from polm/fix/coref-alignment
Fix tokenization mismatch handling in coref
2022-07-11 19:15:05 +09:00
Paul O'Leary McCann
6d9eafeb37
Merge branch 'feature/coref' into fix/coref-alignment 2022-07-11 19:14:37 +09:00
Paul O'Leary McCann
2c2791daa5
Merge pull request #11087 from polm/coref/doc-update
Update Coref Docs
2022-07-11 19:03:14 +09:00
Paul O'Leary McCann
2eee0d248e Fix types
mypy now exits without an error, except for two apparently unrelated
ones about setup.py.
2022-07-08 18:29:14 +09:00
Paul O'Leary McCann
da81a90d64 Span predictor leftovers 2022-07-06 19:29:27 +09:00
Paul O'Leary McCann
b0800ea855 Do dimension inference in span predictor 2022-07-06 19:22:37 +09:00
Paul O'Leary McCann
b59b924e49 Use normal PyTorchWrapper in coref 2022-07-06 19:22:19 +09:00
Paul O'Leary McCann
f67c1735c5 Remove tok2vec_size from coref 2022-07-06 18:58:57 +09:00
Paul O'Leary McCann
bd17c38b74 It works!
Was missing the serialization-related code from biaffine.
2022-07-06 18:58:22 +09:00
Paul O'Leary McCann
ba1bf8ae72 First take at dimension inference
This follows the pattern used in the Biaffine Parser, which uses an init
function to get the size only after the tok2vec is available.

This works at first, but serialization fails with an error.
2022-07-06 18:40:05 +09:00
Paul O'Leary McCann
ce49136458 Update NotImplementedError for coref component 2022-07-06 17:28:15 +09:00
Paul O'Leary McCann
5e405738d2 Update span predictor docstrings 2022-07-06 17:28:05 +09:00
Paul O'Leary McCann
c4de3e51a2 Remove old TODOs 2022-07-06 17:23:41 +09:00
Paul O'Leary McCann
da9c379355 Update docs
Parameter names in architecture docs were not updated after parameters
were renamed.
2022-07-06 17:13:31 +09:00
Paul O'Leary McCann
6f5cf838ec Remove _spans_to_offsets
Basically the same as get_clusters_from_doc
2022-07-06 14:05:05 +09:00
Paul O'Leary McCann
8f598d7b01 Feedback from code review 2022-07-06 14:03:09 +09:00
Paul O'Leary McCann
63e27b5e44
Update spacy/ml/models/coref_util.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-07-06 13:46:02 +09:00
Paul O'Leary McCann
178feae00a Add tests to give up with whitespace differences
Docs in Examples are allowed to have arbitrarily different whitespace.
Handling that properly would be nice but isn't required, but for now
check for it and blow up.
2022-07-04 19:37:42 +09:00
Paul O'Leary McCann
c7f333d593 Rename spans2ints > _spans_to_offsets 2022-07-04 19:28:35 +09:00
Paul O'Leary McCann
b09bbc7f5e Fix alignment issues
I believe this resolves issues with tokenization mismatches.
2022-07-03 20:11:03 +09:00
Paul O'Leary McCann
cf33b48fe0 Update tests 2022-07-03 20:10:53 +09:00
Paul O'Leary McCann
fd574a89c4 Update overfitting test 2022-07-03 19:34:15 +09:00
Paul O'Leary McCann
a46bc03abb Add failing test with tokenization mismatch
This test only fails due to the explicity assert False at the moment,
but the debug output shows that the learned spans are all off by one due
to misalignment. So the code still needs fixing.
2022-07-03 16:01:27 +09:00
Paul O'Leary McCann
619b1102e6 Use config to specify tok2vec_size 2022-07-03 15:32:35 +09:00
Paul O'Leary McCann
1a4dbb702d Add basic span predictor tests 2022-07-03 15:13:15 +09:00
Paul O'Leary McCann
201731df2d Move spans2ints to util 2022-07-03 15:12:53 +09:00
Paul O'Leary McCann
1dacecbbfb Run black 2022-07-03 14:49:02 +09:00
Paul O'Leary McCann
5192ac1617 Clean tests. 2022-07-03 14:48:42 +09:00
Paul O'Leary McCann
79720886fa Merge branch 'feature/coref' into fix/coref-alignment
Had to renumber error message.
2022-07-01 19:09:29 +09:00
Paul O'Leary McCann
c59aeeb0ae
Merge pull request #11043 from kadarakos/feature/coref
Merging master into Feature/coref
2022-07-01 19:04:21 +09:00
Paul O'Leary McCann
dd812ca84a Handle case with nothing to score in span predictor
This case was not handled correctly. It may be desirable to make changes
in the coref component to make sure this doesn't happen, but the span
predictor should also handle this kind of data intelligently internally.

Note that something is still weird because the span predictor seems to
not be learning.
2022-06-29 19:30:37 +09:00
kadarakos
0076f0f617 span predictor device fix 2022-06-29 06:58:47 +00:00
kadarakos
1a782592c4 make sure same device 2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a Merge branch 'master' into feature/coref 2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
d1ff933e9b Test works
This may not be done yet, as the test is just for consistency, and not
overfitting correctly yet.
2022-06-28 19:15:33 +09:00
Paul O'Leary McCann
ef5762d78e Bad hack to get tests to run
This changes the tok2vec size in coref to hardcoded 64 to get tests to
run. This should be reverted and hopefully replaced with proper shape
inference.
2022-06-28 19:06:13 +09:00
Paul O'Leary McCann
af6d5ae2fe Initial test of mismatched tokenization
This runs, but the results are nonsense because the indices are off.
2022-06-28 19:05:47 +09:00
Eric Holscher
308a612ec9
Remove simply (#11017)
I was reading this page, and as a relative beginner, nothing about it was simple :)
2022-06-27 09:45:22 +02:00
github-actions[bot]
4155a59d47
Auto-format code with black (#11022)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-27 09:35:35 +02:00
Adriane Boyd
738b38064f
Merge pull request #11021 from adrianeboyd/chore/v3.4.0
Set version to v3.4.0
2022-06-24 14:54:16 +02:00
Madeesh Kannan
8f1ba4de58
Backport parser/alignment optimizations from feature/refactor-parser (#10952) 2022-06-24 13:39:52 +02:00
Adriane Boyd
d9320db7db Temporarily skip tests that require models/compat 2022-06-24 11:20:53 +02:00
Adriane Boyd
bffe54d02b Set version to v3.4.0 2022-06-24 08:48:58 +02:00
Peter Baumgartner
9738b69c0e
Update Code Conventions.md (#11018) 2022-06-24 15:11:29 +09:00
Dmytro Sadovnychyi
4cd8b4cc22
Fix some of the broken links on universe pages (#11011)
Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
2022-06-23 17:53:00 +02:00
Sofie Van Landeghem
f8116078ce
disable failing test because Stanford servers are down (#11015) 2022-06-23 10:57:46 +02:00
Adriane Boyd
d4e3f43639
Update thinc version to switch back to blis v0.7 (#11014) 2022-06-23 09:50:25 +02:00
Adriane Boyd
f1197d9175
Add API docs for token attribute symbols (#10836)
* Add API docs for token attribute symbols

* Remove NBSP's

* Fix typo

* Rephrase

Co-authored-by: svlandeg <svlandeg@github.com>
2022-06-23 08:16:38 +02:00