spaCy/website/docs/api
Jan Jessewitsch e4dcac4a4b
Merging multiple docs into one (#5032)
* Add static method to Doc to allow merging of multiple docs.

* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().

* Add test for Doc.from_docs() implementation.

* Fix using numpy's concatenate in Doc.from_docs.

* Replace typing's type annotations in from_docs.

* Simply remove type annotations in from_docs.

* Add documentation for Doc.from_docs to api.

* Simplify from_docs, its test and the api doc for codebase consistency.

* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.

* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.

* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.

* Add MORPH to attrs

* Update warnings calls

* Remove out-dated error from merge

* Rename space_delimiter to ensure_whitespace

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
..
annotation.md Update tag maps and docs for English and German (#4501) 2019-10-24 12:56:05 +02:00
cli.md Remove inline notes on v2 changes [ci skip] 2020-07-01 22:29:22 +02:00
cython-classes.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
cython-structs.md Documentation updates for v2.3.0 (#5593) 2020-06-16 15:37:35 +02:00
cython.md 💫 Update website (#3285) 2019-02-17 19:31:19 +01:00
dependencyparser.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
doc.md Merging multiple docs into one (#5032) 2020-07-03 11:32:42 +02:00
docbin.md DocBin: add version number, missing attributes and strings (#5685) 2020-07-02 17:41:50 +02:00
entitylinker.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
entityrecognizer.md Merge branch 'develop' into master-tmp 2020-06-03 14:36:59 +02:00
entityruler.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
goldcorpus.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
goldparse.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
index.md 💫 Update website (#3285) 2019-02-17 19:31:19 +01:00
kb.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
language.md Remove inline notes on v2 changes [ci skip] 2020-07-01 22:29:22 +02:00
lemmatizer.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
lexeme.md Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
lookups.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
matcher.md Update matcher usage examples [ci skip] 2020-07-02 15:39:45 +02:00
phrasematcher.md Update matcher usage examples [ci skip] 2020-07-02 15:39:45 +02:00
pipeline-functions.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
scorer.md Fix markup 2020-07-01 13:02:07 +02:00
sentencizer.md Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
span.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
stringstore.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
tagger.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
textcategorizer.md unicode -> str consistency 2020-05-24 17:23:00 +02:00
token.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
tokenizer.md Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
top-level.md Remove inline notes on v2 changes [ci skip] 2020-07-01 22:29:22 +02:00
vectors.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
vocab.md Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00