mirror of
https://github.com/explosion/spaCy.git
synced 2025-11-19 17:26:01 +03:00
The attribute `DocBin.strings` is a set. In `DocBin.get_docs` a given vocab is updated by iterating over this set. Iteration over a python set produces an arbitrary ordering, therefore vocab is updated non-deterministically. When training (fine-tuning) a spacy model, the base model's vocabulary will be updated with the new vocabulary in the training data in exactly the way described above. After serialization, the file `model/vocab/strings.json` will be sorted in an arbitrary way. This prevents reproducible model training. |
||
|---|---|---|
| .. | ||
| __init__.pxd | ||
| __init__.py | ||
| _dict_proxies.py | ||
| _retokenize.pyx | ||
| _serialize.py | ||
| doc.pxd | ||
| doc.pyx | ||
| graph.pxd | ||
| graph.pyx | ||
| morphanalysis.pxd | ||
| morphanalysis.pyx | ||
| span_group.pxd | ||
| span_group.pyx | ||
| span.pxd | ||
| span.pyx | ||
| token.pxd | ||
| token.pyx | ||
| underscore.py | ||