mirror of
https://github.com/explosion/spaCy.git
synced 2025-10-26 05:31:15 +03:00
* Add static method to Doc to allow merging of multiple docs. * Add error description for the error that occurs if docs with different vocabs (from different languages) are merged in Doc.from_docs(). * Add test for Doc.from_docs() implementation. * Fix using numpy's concatenate in Doc.from_docs. * Replace typing's type annotations in from_docs. * Simply remove type annotations in from_docs. * Add documentation for Doc.from_docs to api. * Simplify from_docs, its test and the api doc for codebase consistency. * Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes. * Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages. * Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test. * Add MORPH to attrs * Update warnings calls * Remove out-dated error from merge * Rename space_delimiter to ensure_whitespace Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> |
||
|---|---|---|
| .. | ||
| annotation.md | ||
| cli.md | ||
| cython-classes.md | ||
| cython-structs.md | ||
| cython.md | ||
| dependencyparser.md | ||
| doc.md | ||
| docbin.md | ||
| entitylinker.md | ||
| entityrecognizer.md | ||
| entityruler.md | ||
| goldcorpus.md | ||
| goldparse.md | ||
| index.md | ||
| kb.md | ||
| language.md | ||
| lemmatizer.md | ||
| lexeme.md | ||
| lookups.md | ||
| matcher.md | ||
| phrasematcher.md | ||
| pipeline-functions.md | ||
| scorer.md | ||
| sentencizer.md | ||
| span.md | ||
| stringstore.md | ||
| tagger.md | ||
| textcategorizer.md | ||
| token.md | ||
| tokenizer.md | ||
| top-level.md | ||
| vectors.md | ||
| vocab.md | ||