Commit Graph

15691 Commits

Author SHA1 Message Date
Paul O'Leary McCann
1b3db149df Merge branch 'fix/coref-alignment' into feature/coref 2022-07-11 19:12:03 +09:00
Paul O'Leary McCann
2c2791daa5
Merge pull request #11087 from polm/coref/doc-update
Update Coref Docs
2022-07-11 19:03:14 +09:00
Paul O'Leary McCann
2eee0d248e Fix types
mypy now exits without an error, except for two apparently unrelated
ones about setup.py.
2022-07-08 18:29:14 +09:00
Paul O'Leary McCann
ce49136458 Update NotImplementedError for coref component 2022-07-06 17:28:15 +09:00
Paul O'Leary McCann
5e405738d2 Update span predictor docstrings 2022-07-06 17:28:05 +09:00
Paul O'Leary McCann
c4de3e51a2 Remove old TODOs 2022-07-06 17:23:41 +09:00
Paul O'Leary McCann
da9c379355 Update docs
Parameter names in architecture docs were not updated after parameters
were renamed.
2022-07-06 17:13:31 +09:00
Paul O'Leary McCann
6f5cf838ec Remove _spans_to_offsets
Basically the same as get_clusters_from_doc
2022-07-06 14:05:05 +09:00
Paul O'Leary McCann
8f598d7b01 Feedback from code review 2022-07-06 14:03:09 +09:00
Paul O'Leary McCann
63e27b5e44
Update spacy/ml/models/coref_util.py
Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-07-06 13:46:02 +09:00
Paul O'Leary McCann
178feae00a Add tests to give up with whitespace differences
Docs in Examples are allowed to have arbitrarily different whitespace.
Handling that properly would be nice but isn't required, but for now
check for it and blow up.
2022-07-04 19:37:42 +09:00
Paul O'Leary McCann
c7f333d593 Rename spans2ints > _spans_to_offsets 2022-07-04 19:28:35 +09:00
Paul O'Leary McCann
b09bbc7f5e Fix alignment issues
I believe this resolves issues with tokenization mismatches.
2022-07-03 20:11:03 +09:00
Paul O'Leary McCann
cf33b48fe0 Update tests 2022-07-03 20:10:53 +09:00
Paul O'Leary McCann
fd574a89c4 Update overfitting test 2022-07-03 19:34:15 +09:00
Paul O'Leary McCann
a46bc03abb Add failing test with tokenization mismatch
This test only fails due to the explicity assert False at the moment,
but the debug output shows that the learned spans are all off by one due
to misalignment. So the code still needs fixing.
2022-07-03 16:01:27 +09:00
Paul O'Leary McCann
619b1102e6 Use config to specify tok2vec_size 2022-07-03 15:32:35 +09:00
Paul O'Leary McCann
1a4dbb702d Add basic span predictor tests 2022-07-03 15:13:15 +09:00
Paul O'Leary McCann
201731df2d Move spans2ints to util 2022-07-03 15:12:53 +09:00
Paul O'Leary McCann
1dacecbbfb Run black 2022-07-03 14:49:02 +09:00
Paul O'Leary McCann
5192ac1617 Clean tests. 2022-07-03 14:48:42 +09:00
Paul O'Leary McCann
79720886fa Merge branch 'feature/coref' into fix/coref-alignment
Had to renumber error message.
2022-07-01 19:09:29 +09:00
Paul O'Leary McCann
c59aeeb0ae
Merge pull request #11043 from kadarakos/feature/coref
Merging master into Feature/coref
2022-07-01 19:04:21 +09:00
Paul O'Leary McCann
dd812ca84a Handle case with nothing to score in span predictor
This case was not handled correctly. It may be desirable to make changes
in the coref component to make sure this doesn't happen, but the span
predictor should also handle this kind of data intelligently internally.

Note that something is still weird because the span predictor seems to
not be learning.
2022-06-29 19:30:37 +09:00
kadarakos
0076f0f617 span predictor device fix 2022-06-29 06:58:47 +00:00
kadarakos
1a782592c4 make sure same device 2022-06-28 12:53:20 +00:00
kadarakos
9f9453865a Merge branch 'master' into feature/coref 2022-06-28 10:27:35 +00:00
Paul O'Leary McCann
d1ff933e9b Test works
This may not be done yet, as the test is just for consistency, and not
overfitting correctly yet.
2022-06-28 19:15:33 +09:00
Paul O'Leary McCann
ef5762d78e Bad hack to get tests to run
This changes the tok2vec size in coref to hardcoded 64 to get tests to
run. This should be reverted and hopefully replaced with proper shape
inference.
2022-06-28 19:06:13 +09:00
Paul O'Leary McCann
af6d5ae2fe Initial test of mismatched tokenization
This runs, but the results are nonsense because the indices are off.
2022-06-28 19:05:47 +09:00
Eric Holscher
308a612ec9
Remove simply (#11017)
I was reading this page, and as a relative beginner, nothing about it was simple :)
2022-06-27 09:45:22 +02:00
github-actions[bot]
4155a59d47
Auto-format code with black (#11022)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-06-27 09:35:35 +02:00
Adriane Boyd
738b38064f
Merge pull request #11021 from adrianeboyd/chore/v3.4.0
Set version to v3.4.0
2022-06-24 14:54:16 +02:00
Madeesh Kannan
8f1ba4de58
Backport parser/alignment optimizations from feature/refactor-parser (#10952) 2022-06-24 13:39:52 +02:00
Adriane Boyd
d9320db7db Temporarily skip tests that require models/compat 2022-06-24 11:20:53 +02:00
Adriane Boyd
bffe54d02b Set version to v3.4.0 2022-06-24 08:48:58 +02:00
Peter Baumgartner
9738b69c0e
Update Code Conventions.md (#11018) 2022-06-24 15:11:29 +09:00
Dmytro Sadovnychyi
4cd8b4cc22
Fix some of the broken links on universe pages (#11011)
Currently some of the "AUTHOR INFO" links (e.g. here[0]) are broken:

```
https://github.com/https://github.com/explosion
```

[0] https://spacy.io/universe/project/spacy-experimental


Also one remains broken with `https://szegedai.github.io/`.
2022-06-23 17:53:00 +02:00
Sofie Van Landeghem
f8116078ce
disable failing test because Stanford servers are down (#11015) 2022-06-23 10:57:46 +02:00
Adriane Boyd
d4e3f43639
Update thinc version to switch back to blis v0.7 (#11014) 2022-06-23 09:50:25 +02:00
Adriane Boyd
f1197d9175
Add API docs for token attribute symbols (#10836)
* Add API docs for token attribute symbols

* Remove NBSP's

* Fix typo

* Rephrase

Co-authored-by: svlandeg <svlandeg@github.com>
2022-06-23 08:16:38 +02:00
Peter Baumgartner
3335bb9d0c
remove cuda116 extra from install widget (#11012) 2022-06-23 08:15:28 +02:00
jademlc
bed23ff291
Update serialization methods code block (#11004)
* Update serialization methods code block

* Update website/docs/usage/saving-loading.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-22 20:45:26 +02:00
Paul O'Leary McCann
16894e665d
Refactor Coval Scoring code (#10875)
* Move coref scoring code to scorer.py

Includes some renames to make names less generic.

* Refactor coval code to remove ternary expressions

* Black formatting

* Add header

* Make scorers into registered scorers

* Small test fixes

* Skip coref tests when torch not present

Coref can't be loaded without Torch, so nothing works.

* Fix remaining type issues

Some of this just involves ignoring types in thorny areas. Two main
issues:

1. Some things have weird types due to indirection/ argskwargs
2. xp2torch return type seems to have changed at some point

* Update spacy/scorer.py

Co-authored-by: kadarakos <kadar.akos@gmail.com>

* Small changes from review

* Be specific about the ValueError

* Type fix

Co-authored-by: kadarakos <kadar.akos@gmail.com>
2022-06-22 16:05:52 +09:00
Sofie Van Landeghem
0fa004c4cd the 'new' indicator wants a 'number' (#10997) 2022-06-21 22:01:16 +02:00
Philip Vollet
1ae13b2a70
Merge pull request #10991 from Lucaterre/master
updated spacy universe for spacyfishing
2022-06-21 10:33:26 +02:00
Daniël de Kok
0271306f16
Use thinc-apple-ops>=0.1.0.dev0 with apple extras (#10904)
* Use thinc-apple-ops>=0.1.0.dev0 with `apple` extras

Also test with thinc-apple-ops that is at least 0.1.0.dev0.

* Check thinc-apple-ops on macOS with Python 3.10

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Use `pip install --pre` for installing thinc-apple-ops in CI

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-06-21 08:26:59 +02:00
Victoria
a08ca064e5
Update linguistic-features.md (#10993)
Change link for downloading fasttext word vectors
2022-06-21 15:03:41 +09:00
Lucaterre
2820d7dd8d correct typo in universe.json for 'code_example' key : pipe name 'entityfishing' 2022-06-20 15:26:23 +02:00
Lucaterre
cdad815c68 updated spacy universe for spacyfishing 2022-06-20 14:28:49 +02:00