mirror of
https://github.com/explosion/spaCy.git
synced 2024-11-10 19:57:17 +03:00
4262f231c5
There are a billion "CoNLL" formats, depending on the tool producing them. The Stanford v3.3 converter has a few quirks that the CoNLL-X conversion wasn't handling: * Sentences may have extra spacing in between the newlines * The coarse-grained POS is the same as the fine-grained POS, so we need a tag map to get the coarse-grained POS. Needing the tag map is particularly unfortunate, it feels like something that should be patched on the source data? Adding the extra option may be confusing to people, especially since it *overwrites* the corpus tag. |
||
---|---|---|
.. | ||
project | ||
templates | ||
__init__.py | ||
_util.py | ||
convert.py | ||
debug_config.py | ||
debug_data.py | ||
debug_model.py | ||
download.py | ||
evaluate.py | ||
info.py | ||
init_config.py | ||
init_model.py | ||
package.py | ||
pretrain.py | ||
profile.py | ||
train.py | ||
validate.py |