spaCy/spacy
Matthew Honnibal cc477be952
Improve gold-standard alignment (#5711)
* Remove previous alignment

* Implement better alignment, using ragged data structure

* Use pytokenizations for alignment

* Fixes

* Fixes

* Fix overlapping entities in alignment

* Fix align split_sents

* Update test

* Commit align.py

* Try to appease setuptools

* Fix flake8

* use realistic entities for testing

* Update tests for better alignment

* Improve alignment heuristic

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2020-07-06 17:39:31 +02:00
..
cli Add alternative CLI option 2020-07-06 15:57:38 +02:00
displacy unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
gold Improve gold-standard alignment (#5711) 2020-07-06 17:39:31 +02:00
lang Tidy up and auto-format 2020-06-21 22:38:04 +02:00
matcher Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
ml Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-04 23:52:12 +02:00
pipeline Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
syntax Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
tests Improve gold-standard alignment (#5711) 2020-07-06 17:39:31 +02:00
tokens Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.0a1 2020-07-03 13:21:08 +02:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Merge branch 'develop' into refactor/remove-symlinks 2020-02-18 17:22:20 +01:00
errors.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
gold.pyx Improve spacy.gold (no GoldParse, no json format!) (#5555) 2020-06-26 19:34:12 +02:00
kb.pxd Tidy up and avoid absolute spacy imports in core 2020-05-21 20:05:03 +02:00
kb.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
language.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
lemmatizer.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
lexeme.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
lexeme.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
lookups.py Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
morphology.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
morphology.pyx refactor fixes (#5664) 2020-06-29 14:33:00 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
pipe_analysis.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
schemas.py Update DVC integration 2020-06-27 14:15:41 +02:00
scorer.py Improve gold-standard alignment (#5711) 2020-07-06 17:39:31 +02:00
strings.pxd Tidy up compiler flags and imports (#5071) 2020-03-02 11:48:10 +01:00
strings.pyx unicode -> str consistency [ci skip] 2020-05-24 18:51:10 +02:00
structs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
symbols.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
tokenizer.pxd Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
tokenizer.pyx Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
typedefs.pxd Update spaCy for thinc 8.0.0 (#4920) 2020-01-29 17:06:46 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00
vectors.pyx Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
vocab.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
vocab.pyx Remove dead and/or deprecated code (#5710) 2020-07-06 13:06:25 +02:00