spaCy/spacy/training
Matthew Honnibal 6f5e308d17
Support negative examples in partial NER annotations (#8106)
* Support a cfg field in transition system

* Make NER 'has gold' check use right alignment for span

* Pass 'negative_samples_key' property into NER transition system

* Add field for negative samples to NER transition system

* Check neg_key in NER has_gold

* Support negative examples in NER oracle

* Test for negative examples in NER

* Fix name of config variable in NER

* Remove vestiges of old-style partial annotation

* Remove obsolete tests

* Add comment noting lack of support for negative samples in parser

* Additions to "neg examples" PR (#8201)

* add custom error and test for deprecated format

* add test for unlearning an entity

* add break also for Begin's cost

* add negative_samples_key property on Parser

* rename

* extend docs & fix some older docs issues

* add subclass constructors, clean up tests, fix docs

* add flaky test with ValueError if gold parse was not found

* remove ValueError if n_gold == 0

* fix docstring

* Hack in environment variables to try out training

* Remove hack

* Remove NER hack, and support 'negative O' samples

* Fix O oracle

* Fix transition parser

* Remove 'not O' from oracle

* Fix NER oracle

* check for spans in both gold.ents and gold.spans and raise if so, to prevent memory access violation

* use set instead of list in consistency check

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-06-17 17:33:00 +10:00
..
converters Fix parser sourcing in NER converter (#7631) 2021-04-08 12:25:03 +02:00
__init__.pxd Renaming gold & annotation_setter (#6042) 2020-09-09 10:31:03 +02:00
__init__.py Add callback to copy vocab/tokenizer from model (#7750) 2021-04-22 12:36:50 +02:00
align.pyx Fix alignment for 1-to-1 tokens and lowercasing (#6476) 2020-12-08 14:25:16 +08:00
alignment.py Replace pytokenizations with internal alignment (#6293) 2020-11-03 16:24:38 +01:00
augment.py Fix lowercase augmentation (#7336) 2021-03-09 14:02:32 +11:00
batchers.py ensure tolerance is properly passed on (#8158) 2021-05-27 18:10:28 +10:00
callbacks.py Add callback to copy vocab/tokenizer from model (#7750) 2021-04-22 12:36:50 +02:00
corpus.py Make JsonlReader path optional (#8396) 2021-06-15 14:55:15 +02:00
example.pxd Make a pre-check to speed up alignment cache (#6139) 2020-09-24 18:13:39 +02:00
example.pyx Support negative examples in partial NER annotations (#8106) 2021-06-17 17:33:00 +10:00
gold_io.pyx Fix is_sent_start when converting from JSON (fix #7635) (#7655) 2021-04-08 18:24:52 +10:00
initialize.py Support large/infinite training corpora (#7208) 2021-04-08 18:08:04 +10:00
iob_utils.py fix docs (#8200) 2021-05-27 10:48:59 +02:00
loggers.py W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429) 2021-04-01 19:36:23 +02:00
loop.py Add training option to set annotations on update (#7767) 2021-04-26 16:53:53 +02:00
pretrain.py replace "is not" with != 2021-03-18 21:09:11 +01:00