spaCy/spacy
adrianeboyd 0c9640ced3 Replace old gold alignment with new gold alignment (#4710)
Replace old gold alignment that allowed for some noise in the alignment between raw and orth with the new simpler alignment that requires that the raw and orth strings are identical except for whitespace and capitalization.

* Replace old alignment with new alignment, removing `_align.pyx` and
its tests
* Remove all quote normalizations
* Enable test for new align
  * Modify test case for quote normalization
2019-11-25 23:13:26 +01:00
..
cli Restructure Example with merged sents as default (#4632) 2019-11-25 16:03:28 +01:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
lang Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
matcher Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
ml Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
pipeline Restructure Example with merged sents as default (#4632) 2019-11-25 16:03:28 +01:00
syntax Restructure Example with merged sents as default (#4632) 2019-11-25 16:03:28 +01:00
tests Replace old gold alignment with new gold alignment (#4710) 2019-11-25 23:13:26 +01:00
tokens Fix serialization of extension attr values in DocBin (#4540) 2019-10-28 16:02:13 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
__main__.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
_ml.py Put Tok2Vec refactor behind feature flag (#4563) 2019-10-31 15:01:15 +01:00
about.py Set version to v2.2.2 2019-10-31 15:53:31 +01:00
analysis.py Support span._. in component decorator attrs (#4555) 2019-10-30 17:19:36 +01:00
attrs.pxd Fix attrs alignment 2019-07-12 17:59:47 +02:00
attrs.pyx Bugfix initializing DocBin with attributes (#4368) 2019-10-03 14:48:45 +02:00
compat.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
errors.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
glossary.py Update tag maps and docs for English and German (#4501) 2019-10-24 12:56:05 +02:00
gold.pxd Restructure Example with merged sents as default (#4632) 2019-11-25 16:03:28 +01:00
gold.pyx Replace old gold alignment with new gold alignment (#4710) 2019-11-25 23:13:26 +01:00
kb.pxd rename entity frequency 2019-07-19 17:40:28 +02:00
kb.pyx KB extensions and better parsing of WikiData (#4375) 2019-10-14 12:28:53 +02:00
language.py Restructure Example with merged sents as default (#4632) 2019-11-25 16:03:28 +01:00
lemmatizer.py Refactor lemmatizer and data table integration (#4353) 2019-10-01 21:36:03 +02:00
lexeme.pxd 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325) 2019-02-24 21:13:51 +01:00
lexeme.pyx Alphanumeric -> alphabetic [ci skip] 2019-10-06 13:30:01 +02:00
lookups.py Refactor lemmatizer and data table integration (#4353) 2019-10-01 21:36:03 +02:00
morphology.pxd annotate kb_id through ents in doc 2019-03-22 11:36:44 +01:00
morphology.pyx Improve Morphology errors (#4314) 2019-09-21 14:37:06 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
scorer.py Restructure Example with merged sents as default (#4632) 2019-11-25 16:03:28 +01:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx Merge branch 'master' into feature/lemmatizer 2019-03-16 13:44:22 +01:00
structs.pxd Replace Entity/MatchStruct with SpanC (#4459) 2019-10-18 11:01:47 +02:00
symbols.pxd Fix symbol alignment 2019-07-12 17:48:38 +02:00
symbols.pyx ensure Span.as_doc keeps the entity links + unit test 2019-06-25 15:28:51 +02:00
tokenizer.pxd Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
tokenizer.pyx Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Generalize handling of tokenizer special cases (#4259) 2019-11-13 21:24:35 +01:00
vectors.pyx Clip most_similar to range [-1, 1] (fixes #4506) (#4507) 2019-10-22 20:10:42 +02:00
vocab.pxd 💫 WIP: Basic lookup class scaffolding and JSON for all lemmati… (#4167) 2019-08-22 14:21:32 +02:00
vocab.pyx Update vocab.get_vector docs to include features on Fasttext ngram (#4464) 2019-10-20 01:28:18 +02:00