spaCy/spacy/tests/doc
Jan Jessewitsch e4dcac4a4b
Merging multiple docs into one (#5032)
* Add static method to Doc to allow merging of multiple docs.

* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().

* Add test for Doc.from_docs() implementation.

* Fix using numpy's concatenate in Doc.from_docs.

* Replace typing's type annotations in from_docs.

* Simply remove type annotations in from_docs.

* Add documentation for Doc.from_docs to api.

* Simplify from_docs, its test and the api doc for codebase consistency.

* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.

* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.

* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.

* Add MORPH to attrs

* Update warnings calls

* Remove out-dated error from merge

* Rename space_delimiter to ensure_whitespace

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
..
__init__.py Revert #4334 2019-09-29 17:32:12 +02:00
test_add_entities.py Tidy up and auto-format 2020-06-20 14:15:04 +02:00
test_array.py Improve spacy.gold (no GoldParse, no json format!) (#5555) 2020-06-26 19:34:12 +02:00
test_creation.py Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
test_doc_api.py Merging multiple docs into one (#5032) 2020-07-03 11:32:42 +02:00
test_morphanalysis.py Tidy up and auto-format 2020-02-18 15:38:18 +01:00
test_pickle_doc.py Drop Python 2.7 and 3.5 (#4828) 2019-12-22 01:53:56 +01:00
test_retokenize_merge.py Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
test_retokenize_split.py Add MORPH attr, add support in retokenizer (#4947) 2020-01-29 17:45:46 +01:00
test_span.py Simplify warnings 2020-04-28 13:37:37 +02:00
test_to_json.py Add better schemas and validation using Pydantic (#4831) 2019-12-25 12:39:49 +01:00
test_token_api.py Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
test_underscore.py Merge branch 'master' into tmp/sync 2020-03-26 13:38:14 +01:00