Paul O'Leary McCann
baeb35f31b
Add type annotations for internal models
2022-07-11 20:03:29 +09:00
Paul O'Leary McCann
4d032396b8
Merge branch 'feature/coref' into coref/dimension-inference
2022-07-11 19:18:46 +09:00
Paul O'Leary McCann
9cbb9702c0
Merge pull request #11042 from polm/fix/coref-alignment
...
Fix tokenization mismatch handling in coref
2022-07-11 19:15:05 +09:00
Paul O'Leary McCann
6d9eafeb37
Merge branch 'feature/coref' into fix/coref-alignment
2022-07-11 19:14:37 +09:00
Paul O'Leary McCann
2c2791daa5
Merge pull request #11087 from polm/coref/doc-update
...
Update Coref Docs
2022-07-11 19:03:14 +09:00
Paul O'Leary McCann
2eee0d248e
Fix types
...
mypy now exits without an error, except for two apparently unrelated
ones about setup.py.
2022-07-08 18:29:14 +09:00
Paul O'Leary McCann
da81a90d64
Span predictor leftovers
2022-07-06 19:29:27 +09:00
Paul O'Leary McCann
b0800ea855
Do dimension inference in span predictor
2022-07-06 19:22:37 +09:00
Paul O'Leary McCann
b59b924e49
Use normal PyTorchWrapper in coref
2022-07-06 19:22:19 +09:00
Paul O'Leary McCann
f67c1735c5
Remove tok2vec_size from coref
2022-07-06 18:58:57 +09:00
Paul O'Leary McCann
bd17c38b74
It works!
...
Was missing the serialization-related code from biaffine.
2022-07-06 18:58:22 +09:00
Paul O'Leary McCann
ba1bf8ae72
First take at dimension inference
...
This follows the pattern used in the Biaffine Parser, which uses an init
function to get the size only after the tok2vec is available.
This works at first, but serialization fails with an error.
2022-07-06 18:40:05 +09:00
Paul O'Leary McCann
ce49136458
Update NotImplementedError for coref component
2022-07-06 17:28:15 +09:00
Paul O'Leary McCann
5e405738d2
Update span predictor docstrings
2022-07-06 17:28:05 +09:00
Paul O'Leary McCann
c4de3e51a2
Remove old TODOs
2022-07-06 17:23:41 +09:00
Paul O'Leary McCann
da9c379355
Update docs
...
Parameter names in architecture docs were not updated after parameters
were renamed.
2022-07-06 17:13:31 +09:00
Paul O'Leary McCann
6f5cf838ec
Remove _spans_to_offsets
...
Basically the same as get_clusters_from_doc
2022-07-06 14:05:05 +09:00
Paul O'Leary McCann
8f598d7b01
Feedback from code review
2022-07-06 14:03:09 +09:00
Paul O'Leary McCann
63e27b5e44
Update spacy/ml/models/coref_util.py
...
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-07-06 13:46:02 +09:00
Paul O'Leary McCann
178feae00a
Add tests to give up with whitespace differences
...
Docs in Examples are allowed to have arbitrarily different whitespace.
Handling that properly would be nice but isn't required, but for now
check for it and blow up.
2022-07-04 19:37:42 +09:00
Paul O'Leary McCann
c7f333d593
Rename spans2ints > _spans_to_offsets
2022-07-04 19:28:35 +09:00
Paul O'Leary McCann
b09bbc7f5e
Fix alignment issues
...
I believe this resolves issues with tokenization mismatches.
2022-07-03 20:11:03 +09:00
Paul O'Leary McCann
cf33b48fe0
Update tests
2022-07-03 20:10:53 +09:00
Paul O'Leary McCann
fd574a89c4
Update overfitting test
2022-07-03 19:34:15 +09:00
Paul O'Leary McCann
a46bc03abb
Add failing test with tokenization mismatch
...
This test only fails due to the explicity assert False at the moment,
but the debug output shows that the learned spans are all off by one due
to misalignment. So the code still needs fixing.
2022-07-03 16:01:27 +09:00
Paul O'Leary McCann
619b1102e6
Use config to specify tok2vec_size
2022-07-03 15:32:35 +09:00
Paul O'Leary McCann
1a4dbb702d
Add basic span predictor tests
2022-07-03 15:13:15 +09:00
Paul O'Leary McCann
201731df2d
Move spans2ints to util
2022-07-03 15:12:53 +09:00
Paul O'Leary McCann
1dacecbbfb
Run black
2022-07-03 14:49:02 +09:00
Paul O'Leary McCann
5192ac1617
Clean tests.
2022-07-03 14:48:42 +09:00
Paul O'Leary McCann
79720886fa
Merge branch 'feature/coref' into fix/coref-alignment
...
Had to renumber error message.
2022-07-01 19:09:29 +09:00
Paul O'Leary McCann
c59aeeb0ae
Merge pull request #11043 from kadarakos/feature/coref
...
Merging master into Feature/coref
2022-07-01 19:04:21 +09:00
Paul O'Leary McCann
dd812ca84a
Handle case with nothing to score in span predictor
...
This case was not handled correctly. It may be desirable to make changes
in the coref component to make sure this doesn't happen, but the span
predictor should also handle this kind of data intelligently internally.
Note that something is still weird because the span predictor seems to
not be learning.
2022-06-29 19:30:37 +09:00
kadarakos
0076f0f617
span predictor device fix
2022-06-29 06:58:47 +00:00
kadarakos
1a782592c4
make sure same device
2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a
Merge branch 'master' into feature/coref
2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
d1ff933e9b
Test works
...
This may not be done yet, as the test is just for consistency, and not
overfitting correctly yet.
2022-06-28 19:15:33 +09:00
Paul O'Leary McCann
ef5762d78e
Bad hack to get tests to run
...
This changes the tok2vec size in coref to hardcoded 64 to get tests to
run. This should be reverted and hopefully replaced with proper shape
inference.
2022-06-28 19:06:13 +09:00
Paul O'Leary McCann
af6d5ae2fe
Initial test of mismatched tokenization
...
This runs, but the results are nonsense because the indices are off.
2022-06-28 19:05:47 +09:00
Eric Holscher
308a612ec9
Remove simply
( #11017 )
...
I was reading this page, and as a relative beginner, nothing about it was simple :)
2022-06-27 09:45:22 +02:00
github-actions[bot]
4155a59d47
Auto-format code with black ( #11022 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-27 09:35:35 +02:00
Adriane Boyd
738b38064f
Merge pull request #11021 from adrianeboyd/chore/v3.4.0
...
Set version to v3.4.0
2022-06-24 14:54:16 +02:00
Madeesh Kannan
8f1ba4de58
Backport parser/alignment optimizations from feature/refactor-parser
( #10952 )
2022-06-24 13:39:52 +02:00
Adriane Boyd
d9320db7db
Temporarily skip tests that require models/compat
2022-06-24 11:20:53 +02:00
Adriane Boyd
bffe54d02b
Set version to v3.4.0
2022-06-24 08:48:58 +02:00
Peter Baumgartner
9738b69c0e
Update Code Conventions.md ( #11018 )
2022-06-24 15:11:29 +09:00
Dmytro Sadovnychyi
4cd8b4cc22
Fix some of the broken links on universe pages ( #11011 )
...
Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:
```
https://github.com/https://github.com/explosion
```
[0] https://spacy.io/universe/project/spacy-experimental
Also one remains broken with `https://szegedai.github.io/ `.
2022-06-23 17:53:00 +02:00
Sofie Van Landeghem
f8116078ce
disable failing test because Stanford servers are down ( #11015 )
2022-06-23 10:57:46 +02:00
Adriane Boyd
d4e3f43639
Update thinc version to switch back to blis v0.7 ( #11014 )
2022-06-23 09:50:25 +02:00
Adriane Boyd
f1197d9175
Add API docs for token attribute symbols ( #10836 )
...
* Add API docs for token attribute symbols
* Remove NBSP's
* Fix typo
* Rephrase
Co-authored-by: svlandeg <svlandeg@github.com>
2022-06-23 08:16:38 +02:00