mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-11 04:08:09 +03:00
2516896849
* Make vocab update in get_docs deterministic
The attribute `DocBin.strings` is a set. In `DocBin.get_docs`
a given vocab is updated by iterating over this set.
Iteration over a python set produces an arbitrary ordering,
therefore vocab is updated non-deterministically.
When training (fine-tuning) a spacy model, the base model's
vocabulary will be updated with the new vocabulary in the
training data in exactly the way described above. After
serialization, the file `model/vocab/strings.json` will
be sorted in an arbitrary way. This prevents reproducible
model training.
* Revert "Make vocab update in get_docs deterministic"
This reverts commit
|
||
---|---|---|
.. | ||
__init__.py | ||
test_resource_warning.py | ||
test_serialize_config.py | ||
test_serialize_doc.py | ||
test_serialize_extension_attrs.py | ||
test_serialize_kb.py | ||
test_serialize_language.py | ||
test_serialize_pipeline.py | ||
test_serialize_tokenizer.py | ||
test_serialize_vocab_strings.py |