mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-14 21:57:15 +03:00
e48a09df4e
* OrigAnnot class instead of gold.orig_annot list of zipped tuples * from_orig to replace from_annot_tuples * rename to RawAnnot * some unit tests for GoldParse creation and internal format * removing orig_annot and switching to lists instead of tuple * rewriting tuples to use RawAnnot (+ debug statements, WIP) * fix pop() changing the data * small fixes * pop-append fixes * return RawAnnot for existing GoldParse to have uniform interface * clean up imports * fix merge_sents * add unit test for 4402 with new structure (not working yet) * introduce DocAnnot * typo fixes * add unit test for merge_sents * rename from_orig to from_raw * fixing unit tests * fix nn parser * read_annots to produce text, doc_annot pairs * _make_golds fix * rename golds_to_gold_annots * small fixes * fix encoding * have golds_to_gold_annots use DocAnnot * missed a spot * merge_sents as function in DocAnnot * allow specifying only part of the token-level annotations * refactor with Example class + underlying dicts * pipeline components to work with Example objects (wip) * input checking * fix yielding * fix calls to update * small fixes * fix scorer unit test with new format * fix kwargs order * fixes for ud and conllu scripts * fix reading data for conllu script * add in proper errors (not fixed numbering yet to avoid merge conflicts) * fixing few more small bugs * fix EL script
23 lines
629 B
Python
23 lines
629 B
Python
# coding: utf8
|
|
from __future__ import unicode_literals
|
|
|
|
from spacy.lang.en import English
|
|
from spacy.util import minibatch, compounding
|
|
|
|
|
|
def test_issue4348():
|
|
"""Test that training the tagger with empty data, doesn't throw errors"""
|
|
|
|
TRAIN_DATA = [("", {"tags": []}), ("", {"tags": []})]
|
|
|
|
nlp = English()
|
|
tagger = nlp.create_pipe("tagger")
|
|
nlp.add_pipe(tagger)
|
|
|
|
optimizer = nlp.begin_training()
|
|
for i in range(5):
|
|
losses = {}
|
|
batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
|
|
for batch in batches:
|
|
nlp.update(batch, sgd=optimizer, losses=losses)
|