spaCy/spacy/tests/doc
adrianeboyd aec755d3a3 Modify retokenizer to use span root attributes (#4219)
* Modify retokenizer to use span root attributes

* tag/pos/morph are set to root tag/pos/morph

* lemma and norm are reset and end up as orth (not ideal, but better
than orth of first token)

* Also handle individual merge case

* Add test

* Attempt to handle ent_iob and ent_type in merges

* Fix check for whether B-ENT should become I-ENT

* Move IOB consistency check to after attrs

Move all IOB consistency checks after attrs are set and simplify to
check entire document, modifying I to B at the beginning of the document
or if the entity type of the previous token isn't the same.

* Move IOB consistency check for single merge

Move IOB consistency check after the token array is compressed for the
single merge case.

* Update spacy/tokens/_retokenize.pyx

Co-Authored-By: Matthew Honnibal <honnibal+gh@gmail.com>

* Remove single vs. multiple merge distinction

Remove original single-instance `_merge()` and use `_bulk_merge()` (now
renamed `_merge()`) for all merges.

* Add out-of-bound check in previous entity check
2019-09-08 13:04:49 +02:00
..
__init__.py Rename "tokens" tests to "doc" 2017-01-11 18:59:01 +01:00
test_add_entities.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
test_array.py Un-xfail test 2019-03-10 15:51:15 +01:00
test_creation.py 💫 Tidy up and auto-format tests (#2967) 2018-11-27 01:09:36 +01:00
test_doc_api.py Add Doc.lang and Doc.lang_ 2019-03-11 14:21:40 +01:00
test_pickle_doc.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
test_retokenize_merge.py Modify retokenizer to use span root attributes (#4219) 2019-09-08 13:04:49 +02:00
test_retokenize_split.py 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325) 2019-02-24 21:13:51 +01:00
test_span.py Add util.filter_spans helper (#3686) 2019-05-08 02:33:40 +02:00
test_to_json.py 💫 Add token match pattern validation via JSON schemas (#3244) 2019-02-13 01:47:26 +11:00
test_token_api.py Fix token.conjuncts (closes #795) (#3392) 2019-03-11 17:05:45 +01:00
test_underscore.py 💫 Improve introspection of custom extension attributes (#3729) 2019-05-12 00:53:11 +02:00