mirror of
https://github.com/explosion/spaCy.git
synced 2025-04-25 03:13:41 +03:00
The attribute `DocBin.strings` is a set. In `DocBin.get_docs` a given vocab is updated by iterating over this set. Iteration over a python set produces an arbitrary ordering, therefore vocab is updated non-deterministically. When training (fine-tuning) a spacy model, the base model's vocabulary will be updated with the new vocabulary in the training data in exactly the way described above. After serialization, the file `model/vocab/strings.json` will be sorted in an arbitrary way. This prevents reproducible model training. |
||
---|---|---|
.. | ||
__init__.pxd | ||
__init__.py | ||
_dict_proxies.py | ||
_retokenize.pyx | ||
_serialize.py | ||
doc.pxd | ||
doc.pyx | ||
graph.pxd | ||
graph.pyx | ||
morphanalysis.pxd | ||
morphanalysis.pyx | ||
span_group.pxd | ||
span_group.pyx | ||
span.pxd | ||
span.pyx | ||
token.pxd | ||
token.pyx | ||
underscore.py |