spaCy/spacy/cli
Matthew Honnibal f0ec7bcb79
Flag to ignore examples with mismatched raw/gold text (#4534)
* Flag to ignore examples with mismatched raw/gold text

After #4525, we're seeing some alignment failures on our OntoNotes data. I think we actually have fixes for most of these cases.

In general it's better to fix the data, but it seems good to allow the GoldCorpus class to just skip cases where the raw text doesn't
match up to the gold words. I think previously we were silently ignoring these cases.

* Try to fix test on Python 2.7
2019-10-28 11:40:12 +01:00
..
converters Auto-format [ci skip] 2019-10-24 16:21:08 +02:00
__init__.py Move UD scripts to bin 2019-03-20 01:19:34 +01:00
_schemas.py Store JSON schemas in Python and tidy up (#3235) 2019-02-07 19:44:31 +11:00
convert.py Auto-format [ci skip] 2019-10-24 16:21:08 +02:00
debug_data.py Checks/errors related to ill-formed IOB input in CLI convert and debug-data (#4487) 2019-10-21 12:20:28 +02:00
download.py Improve usage of pkg_resources and handling of entry points (#4387) 2019-10-07 17:22:09 +02:00
evaluate.py Add textcat to train CLI (#4226) 2019-09-15 22:31:31 +02:00
info.py Small CLI improvements (#3030) 2018-12-08 11:49:43 +01:00
init_model.py Support model name in init-model 2019-09-26 03:01:32 +02:00
link.py Small CLI improvements (#3030) 2018-12-08 11:49:43 +01:00
package.py Also support "requirements" in model.json 2019-07-27 13:34:57 +02:00
pretrain.py KB extensions and better parsing of WikiData (#4375) 2019-10-14 12:28:53 +02:00
profile.py pulling tqdm imports in functions to avoid bug (tmp fix) (#4263) 2019-09-09 16:32:11 +02:00
train.py Flag to ignore examples with mismatched raw/gold text (#4534) 2019-10-28 11:40:12 +01:00
validate.py Improve usage of pkg_resources and handling of entry points (#4387) 2019-10-07 17:22:09 +02:00