spaCy/spacy
Matthew Honnibal f0ec7bcb79
Flag to ignore examples with mismatched raw/gold text (#4534)
* Flag to ignore examples with mismatched raw/gold text

After #4525, we're seeing some alignment failures on our OntoNotes data. I think we actually have fixes for most of these cases.

In general it's better to fix the data, but it seems good to allow the GoldCorpus class to just skip cases where the raw text doesn't
match up to the gold words. I think previously we were silently ignoring these cases.

* Try to fix test on Python 2.7
2019-10-28 11:40:12 +01:00
..
cli Flag to ignore examples with mismatched raw/gold text (#4534) 2019-10-28 11:40:12 +01:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy Move lookup tables out of the core library (#4346) 2019-10-01 00:01:27 +02:00
lang Tidy up and auto-format [ci skip] 2019-10-24 16:20:48 +02:00
matcher Implement new API for {Phrase}Matcher.add (backwards-compatible) (#4522) 2019-10-25 22:21:08 +02:00
ml Remove print 2019-10-27 22:24:19 +01:00
pipeline Fix --gold-preproc train cli command (#4392) 2019-10-27 21:58:50 +01:00
syntax Clarify parser model CPU/GPU code (#4535) 2019-10-27 23:43:09 +01:00
tests Fix tests for gpu 2019-10-27 22:19:18 +01:00
tokens Use numpy.frombuffer instead of fromstring 2019-10-24 16:18:41 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Component decorator and component analysis (#4517) 2019-10-27 13:35:49 +01:00
__main__.py Update __main__.py 2019-03-20 09:43:26 +01:00
_align.pyx Improve alignment around quotes 2018-08-16 01:04:34 +02:00
_ml.py Refactor Tok2Vec to use architecture registry (#4518) 2019-10-25 22:28:20 +02:00
about.py Set version to v2.2.2.dev1 2019-10-22 20:11:25 +02:00
analysis.py Component decorator and component analysis (#4517) 2019-10-27 13:35:49 +01:00
attrs.pxd Fix attrs alignment 2019-07-12 17:59:47 +02:00
attrs.pyx Bugfix initializing DocBin with attributes (#4368) 2019-10-03 14:48:45 +02:00
compat.py Component decorator and component analysis (#4517) 2019-10-27 13:35:49 +01:00
errors.py Component decorator and component analysis (#4517) 2019-10-27 13:35:49 +01:00
glossary.py Update tag maps and docs for English and German (#4501) 2019-10-24 12:56:05 +02:00
gold.pxd Merge changes from master 2019-08-21 14:18:52 +02:00
gold.pyx Flag to ignore examples with mismatched raw/gold text (#4534) 2019-10-28 11:40:12 +01:00
kb.pxd rename entity frequency 2019-07-19 17:40:28 +02:00
kb.pyx KB extensions and better parsing of WikiData (#4375) 2019-10-14 12:28:53 +02:00
language.py Fix --gold-preproc train cli command (#4392) 2019-10-27 21:58:50 +01:00
lemmatizer.py Refactor lemmatizer and data table integration (#4353) 2019-10-01 21:36:03 +02:00
lexeme.pxd 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325) 2019-02-24 21:13:51 +01:00
lexeme.pyx Alphanumeric -> alphabetic [ci skip] 2019-10-06 13:30:01 +02:00
lookups.py Refactor lemmatizer and data table integration (#4353) 2019-10-01 21:36:03 +02:00
morphology.pxd annotate kb_id through ents in doc 2019-03-22 11:36:44 +01:00
morphology.pyx Improve Morphology errors (#4314) 2019-09-21 14:37:06 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
scorer.py Tidy up and auto-format 2019-10-18 11:27:38 +02:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx Merge branch 'master' into feature/lemmatizer 2019-03-16 13:44:22 +01:00
structs.pxd Replace Entity/MatchStruct with SpanC (#4459) 2019-10-18 11:01:47 +02:00
symbols.pxd Fix symbol alignment 2019-07-12 17:48:38 +02:00
symbols.pyx ensure Span.as_doc keeps the entity links + unit test 2019-06-25 15:28:51 +02:00
tokenizer.pxd Flush tokenizer cache when necessary (#4258) 2019-09-08 20:52:46 +02:00
tokenizer.pyx prevent zero-length mem alloc (#4429) 2019-10-22 16:54:33 +02:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Component decorator and component analysis (#4517) 2019-10-27 13:35:49 +01:00
vectors.pyx Clip most_similar to range [-1, 1] (fixes #4506) (#4507) 2019-10-22 20:10:42 +02:00
vocab.pxd 💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) 2019-08-22 14:21:32 +02:00
vocab.pyx Update vocab.get_vector docs to include features on Fasttext ngram (#4464) 2019-10-20 01:28:18 +02:00