Paul O'Leary McCann
1b3db149df
Merge branch 'fix/coref-alignment' into feature/coref
2022-07-11 19:12:03 +09:00
Paul O'Leary McCann
2c2791daa5
Merge pull request #11087 from polm/coref/doc-update
...
Update Coref Docs
2022-07-11 19:03:14 +09:00
Paul O'Leary McCann
2eee0d248e
Fix types
...
mypy now exits without an error, except for two apparently unrelated
ones about setup.py.
2022-07-08 18:29:14 +09:00
Paul O'Leary McCann
ce49136458
Update NotImplementedError for coref component
2022-07-06 17:28:15 +09:00
Paul O'Leary McCann
5e405738d2
Update span predictor docstrings
2022-07-06 17:28:05 +09:00
Paul O'Leary McCann
c4de3e51a2
Remove old TODOs
2022-07-06 17:23:41 +09:00
Paul O'Leary McCann
da9c379355
Update docs
...
Parameter names in architecture docs were not updated after parameters
were renamed.
2022-07-06 17:13:31 +09:00
Paul O'Leary McCann
6f5cf838ec
Remove _spans_to_offsets
...
Basically the same as get_clusters_from_doc
2022-07-06 14:05:05 +09:00
Paul O'Leary McCann
8f598d7b01
Feedback from code review
2022-07-06 14:03:09 +09:00
Paul O'Leary McCann
63e27b5e44
Update spacy/ml/models/coref_util.py
...
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-07-06 13:46:02 +09:00
Paul O'Leary McCann
178feae00a
Add tests to give up with whitespace differences
...
Docs in Examples are allowed to have arbitrarily different whitespace.
Handling that properly would be nice but isn't required, but for now
check for it and blow up.
2022-07-04 19:37:42 +09:00
Paul O'Leary McCann
c7f333d593
Rename spans2ints > _spans_to_offsets
2022-07-04 19:28:35 +09:00
Paul O'Leary McCann
b09bbc7f5e
Fix alignment issues
...
I believe this resolves issues with tokenization mismatches.
2022-07-03 20:11:03 +09:00
Paul O'Leary McCann
cf33b48fe0
Update tests
2022-07-03 20:10:53 +09:00
Paul O'Leary McCann
fd574a89c4
Update overfitting test
2022-07-03 19:34:15 +09:00
Paul O'Leary McCann
a46bc03abb
Add failing test with tokenization mismatch
...
This test only fails due to the explicity assert False at the moment,
but the debug output shows that the learned spans are all off by one due
to misalignment. So the code still needs fixing.
2022-07-03 16:01:27 +09:00
Paul O'Leary McCann
619b1102e6
Use config to specify tok2vec_size
2022-07-03 15:32:35 +09:00
Paul O'Leary McCann
1a4dbb702d
Add basic span predictor tests
2022-07-03 15:13:15 +09:00
Paul O'Leary McCann
201731df2d
Move spans2ints to util
2022-07-03 15:12:53 +09:00
Paul O'Leary McCann
1dacecbbfb
Run black
2022-07-03 14:49:02 +09:00
Paul O'Leary McCann
5192ac1617
Clean tests.
2022-07-03 14:48:42 +09:00
Paul O'Leary McCann
79720886fa
Merge branch 'feature/coref' into fix/coref-alignment
...
Had to renumber error message.
2022-07-01 19:09:29 +09:00
Paul O'Leary McCann
c59aeeb0ae
Merge pull request #11043 from kadarakos/feature/coref
...
Merging master into Feature/coref
2022-07-01 19:04:21 +09:00
Paul O'Leary McCann
dd812ca84a
Handle case with nothing to score in span predictor
...
This case was not handled correctly. It may be desirable to make changes
in the coref component to make sure this doesn't happen, but the span
predictor should also handle this kind of data intelligently internally.
Note that something is still weird because the span predictor seems to
not be learning.
2022-06-29 19:30:37 +09:00
kadarakos
0076f0f617
span predictor device fix
2022-06-29 06:58:47 +00:00
kadarakos
1a782592c4
make sure same device
2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a
Merge branch 'master' into feature/coref
2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
d1ff933e9b
Test works
...
This may not be done yet, as the test is just for consistency, and not
overfitting correctly yet.
2022-06-28 19:15:33 +09:00
Paul O'Leary McCann
ef5762d78e
Bad hack to get tests to run
...
This changes the tok2vec size in coref to hardcoded 64 to get tests to
run. This should be reverted and hopefully replaced with proper shape
inference.
2022-06-28 19:06:13 +09:00
Paul O'Leary McCann
af6d5ae2fe
Initial test of mismatched tokenization
...
This runs, but the results are nonsense because the indices are off.
2022-06-28 19:05:47 +09:00
Eric Holscher
308a612ec9
Remove simply
( #11017 )
...
I was reading this page, and as a relative beginner, nothing about it was simple :)
2022-06-27 09:45:22 +02:00
github-actions[bot]
4155a59d47
Auto-format code with black ( #11022 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-27 09:35:35 +02:00
Adriane Boyd
738b38064f
Merge pull request #11021 from adrianeboyd/chore/v3.4.0
...
Set version to v3.4.0
2022-06-24 14:54:16 +02:00
Madeesh Kannan
8f1ba4de58
Backport parser/alignment optimizations from feature/refactor-parser
( #10952 )
2022-06-24 13:39:52 +02:00
Adriane Boyd
d9320db7db
Temporarily skip tests that require models/compat
2022-06-24 11:20:53 +02:00
Adriane Boyd
bffe54d02b
Set version to v3.4.0
2022-06-24 08:48:58 +02:00
Peter Baumgartner
9738b69c0e
Update Code Conventions.md ( #11018 )
2022-06-24 15:11:29 +09:00
Dmytro Sadovnychyi
4cd8b4cc22
Fix some of the broken links on universe pages ( #11011 )
...
Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:
```
https://github.com/https://github.com/explosion
```
[0] https://spacy.io/universe/project/spacy-experimental
Also one remains broken with `https://szegedai.github.io/ `.
2022-06-23 17:53:00 +02:00
Sofie Van Landeghem
f8116078ce
disable failing test because Stanford servers are down ( #11015 )
2022-06-23 10:57:46 +02:00
Adriane Boyd
d4e3f43639
Update thinc version to switch back to blis v0.7 ( #11014 )
2022-06-23 09:50:25 +02:00
Adriane Boyd
f1197d9175
Add API docs for token attribute symbols ( #10836 )
...
* Add API docs for token attribute symbols
* Remove NBSP's
* Fix typo
* Rephrase
Co-authored-by: svlandeg <svlandeg@github.com>
2022-06-23 08:16:38 +02:00
Peter Baumgartner
3335bb9d0c
remove cuda116
extra from install widget ( #11012 )
2022-06-23 08:15:28 +02:00
jademlc
bed23ff291
Update serialization methods code block ( #11004 )
...
* Update serialization methods code block
* Update website/docs/usage/saving-loading.md
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-22 20:45:26 +02:00
Paul O'Leary McCann
16894e665d
Refactor Coval Scoring code ( #10875 )
...
* Move coref scoring code to scorer.py
Includes some renames to make names less generic.
* Refactor coval code to remove ternary expressions
* Black formatting
* Add header
* Make scorers into registered scorers
* Small test fixes
* Skip coref tests when torch not present
Coref can't be loaded without Torch, so nothing works.
* Fix remaining type issues
Some of this just involves ignoring types in thorny areas. Two main
issues:
1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point
* Update spacy/scorer.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
* Small changes from review
* Be specific about the ValueError
* Type fix
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-06-22 16:05:52 +09:00
Sofie Van Landeghem
0fa004c4cd
the 'new' indicator wants a 'number' ( #10997 )
2022-06-21 22:01:16 +02:00
Philip Vollet
1ae13b2a70
Merge pull request #10991 from Lucaterre/master
...
updated spacy universe for spacyfishing
2022-06-21 10:33:26 +02:00
Daniël de Kok
0271306f16
Use thinc-apple-ops>=0.1.0.dev0 with apple
extras ( #10904 )
...
* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras
Also test with thinc-apple-ops that is at least 0.1.0.dev0.
* Check thinc-apple-ops on macOS with Python 3.10
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Use `pip install --pre` for installing thinc-apple-ops in CI
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-21 08:26:59 +02:00
Victoria
a08ca064e5
Update linguistic-features.md ( #10993 )
...
Change link for downloading fasttext word vectors
2022-06-21 15:03:41 +09:00
Lucaterre
2820d7dd8d
correct typo in universe.json for 'code_example' key : pipe name 'entityfishing'
2022-06-20 15:26:23 +02:00
Lucaterre
cdad815c68
updated spacy universe for spacyfishing
2022-06-20 14:28:49 +02:00