Commit Graph

9 Commits

Author SHA1 Message Date
Paul O'Leary McCann
6f5cf838ec Remove _spans_to_offsets
Basically the same as get_clusters_from_doc
2022-07-06 14:05:05 +09:00
Paul O'Leary McCann
8f598d7b01 Feedback from code review 2022-07-06 14:03:09 +09:00
Paul O'Leary McCann
178feae00a Add tests to give up with whitespace differences
Docs in Examples are allowed to have arbitrarily different whitespace.
Handling that properly would be nice but isn't required, but for now
check for it and blow up.
2022-07-04 19:37:42 +09:00
Paul O'Leary McCann
c7f333d593 Rename spans2ints > _spans_to_offsets 2022-07-04 19:28:35 +09:00
Paul O'Leary McCann
cf33b48fe0 Update tests 2022-07-03 20:10:53 +09:00
Paul O'Leary McCann
fd574a89c4 Update overfitting test 2022-07-03 19:34:15 +09:00
Paul O'Leary McCann
a46bc03abb Add failing test with tokenization mismatch
This test only fails due to the explicity assert False at the moment,
but the debug output shows that the learned spans are all off by one due
to misalignment. So the code still needs fixing.
2022-07-03 16:01:27 +09:00
Paul O'Leary McCann
619b1102e6 Use config to specify tok2vec_size 2022-07-03 15:32:35 +09:00
Paul O'Leary McCann
1a4dbb702d Add basic span predictor tests 2022-07-03 15:13:15 +09:00