spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-18 21:20:59 +03:00

History

Raphael Mitsch 8387ce4c01 Add Doc.from_json() (#10688 ) * Implement Doc.from_json: rough draft. * Implement Doc.from_json: first draft with tests. * Implement Doc.from_json: added documentation on website for Doc.to_json(), Doc.from_json(). * Implement Doc.from_json: formatting changes. * Implement Doc.to_json(): reverting unrelated formatting changes. * Implement Doc.to_json(): fixing entity and span conversion. Moving fixture and doc <-> json conversion tests into single file. * Implement Doc.from_json(): replaced entity/span converters with doc.char_span() calls. * Implement Doc.from_json(): handling sentence boundaries in spans. * Implementing Doc.from_json(): added parser-free sentence boundaries transfer. * Implementing Doc.from_json(): added parser-free sentence boundaries transfer. * Implementing Doc.from_json(): incorporated various PR feedback. * Renaming fixture for document without dependencies. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implementing Doc.from_json(): using two sent_starts instead of one. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implementing Doc.from_json(): doc_without_dependency_parser() -> doc_without_deps. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implementing Doc.from_json(): incorporating various PR feedback. Rebased on latest master. * Implementing Doc.from_json(): refactored Doc.from_json() to work with annotation IDs instead of their string representations. * Implement Doc.from_json(): reverting unwanted formatting/rebasing changes. * Implement Doc.from_json(): added check for char_span() calculation for entities. * Update spacy/tokens/doc.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): minor refactoring, additional check for token attribute consistency with corresponding test. * Implement Doc.from_json(): removed redundancy in annotation type key naming. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): Simplifying setting annotation values. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement doc.from_json(): renaming annot_types to token_attrs. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): adjustments for renaming of annot_types to token_attrs. * Implement Doc.from_json(): removing default categories. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): simplifying lexeme initialization. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): simplifying lexeme initialization. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): refactoring to only have keys for present annotations. * Implement Doc.from_json(): fix check for tokens' HEAD attributes. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): refactoring Doc.from_json(). * Implement Doc.from_json(): fixing span_group retrieval. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): fixing span retrieval. * Implement Doc.from_json(): added schema for Doc JSON format. Minor refactoring in Doc.from_json(). * Implement Doc.from_json(): added comment regarding Token and Span extension support. * Implement Doc.from_json(): renaming inconsistent_props to partial_attrs.. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): adjusting error message. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): extending E1038 message. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): added params to E1038 raises. * Implement Doc.from_json(): combined attribute collection with partial attributes check. * Implement Doc.from_json(): added optional schema validation. * Implement Doc.from_json(): fixed optional fields in schema, tests. * Implement Doc.from_json(): removed redundant None check for DEP. * Implement Doc.from_json(): added passing of schema validatoin message to E1037.. * Implement Doc.from_json(): removing redundant error E1040. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): changing message for E1037. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): adjusted website docs and docstring of Doc.from_json(). * Update spacy/tests/doc/test_json_doc_conversion.py * Implement Doc.from_json(): docstring update. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): docstring update. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): website docs update. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): docstring formatting. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): docstring formatting. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): fixing Doc reference in website docs. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): reformatted website/docs/api/doc.md. * Implement Doc.from_json(): bumped IDs of new errors to avoid merge conflicts. * Implement Doc.from_json(): fixing bug in tests. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): fix setting of sentence starts for docs without DEP. * Implement Doc.from_json(): add check for valid char spans when manually setting sentence boundaries. Refactor sentence boundary setting slightly. Move error message for lack of support for partial token annotations to errors.py. * Implement Doc.from_json(): simplify token sentence start manipulation. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Combine related error messages * Update spacy/tests/doc/test_json_doc_conversion.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>		2022-06-02 14:03:47 +02:00
..
architectures.md	Tagger: use unnormalized probabilities for inference (#10197 )	2022-03-15 14:15:31 +01:00
attributeruler.md	Document scorers in registry and components from #8766 (#8929 )	2021-08-12 12:50:03 +02:00
cli.md	Remove NBSP's across tables in the docs (#10842 )	2022-05-25 09:48:39 +02:00
corpus.md	Remove NBSP's across tables in the docs (#10842 )	2022-05-25 09:48:39 +02:00
cython-classes.md	Update docs, types and API consistency	2020-08-17 16:45:24 +02:00
cython-structs.md	Update docs, types and API consistency	2020-08-17 16:45:24 +02:00
cython.md	Update docs [ci skip]	2020-09-12 17:05:10 +02:00
data-formats.md	Fix references to config file in the docs & UX (#9961 )	2022-01-04 14:31:26 +01:00
dependencymatcher.md	doc fixes	2020-09-12 17:38:54 +02:00
dependencyparser.md	Fix types in API docs for moves in parser and ner (#10464 )	2022-03-08 13:51:11 +01:00
doc.md	Add Doc.from_json() (#10688 )	2022-06-02 14:03:47 +02:00
docbin.md	Fix point typo on docbin docs (#9097 )	2021-08-31 10:55:44 +02:00
edittreelemmatizer.md	Add edit tree lemmatizer (#10231 )	2022-03-28 11:13:50 +02:00
entitylinker.md	Fix entity linker batching (#9669 )	2022-03-04 09:17:36 +01:00
entityrecognizer.md	Fix types in API docs for moves in parser and ner (#10464 )	2022-03-08 13:51:11 +01:00
entityruler.md	Add SpanRuler component (#9880 )	2022-06-02 13:12:53 +02:00
example.md	Extend score_spans for overlapping & non-labeled spans (#7209 )	2021-04-08 12:19:17 +02:00
index.md	Update v3 docs	2020-07-03 16:48:21 +02:00
kb.md	Tidy up docs	2021-06-28 12:08:15 +02:00
language.md	Remove NBSP's across tables in the docs (#10842 )	2022-05-25 09:48:39 +02:00
legacy.md	Add test for old architectures (#10751 )	2022-05-10 08:24:42 +02:00
lemmatizer.md	Add edit tree lemmatizer (#10231 )	2022-03-28 11:13:50 +02:00
lexeme.md	fix 's typo's across code base (#8384 )	2021-06-15 10:57:08 +02:00
lookups.md	Update docs, types and API consistency	2020-08-17 16:45:24 +02:00
matcher.md	Remove NBSP's across tables in the docs (#10842 )	2022-05-25 09:48:39 +02:00
morphologizer.md	Update overwrite and scorer in API docs (#9384 )	2021-10-11 10:35:07 +02:00
morphology.md	Document Assigned Attributes of Pipeline Components (#9041 )	2021-09-01 12:09:39 +02:00
phrasematcher.md	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 )	2021-10-14 15:21:40 +02:00
pipe.md	Document scorers in registry and components from #8766 (#8929 )	2021-08-12 12:50:03 +02:00
pipeline-functions.md	add doc cleaner to menu (#10862 )	2022-05-30 08:51:19 +02:00
scorer.md	Add micro PRF for morph scoring (#9546 )	2021-10-29 10:29:29 +02:00
sentencerecognizer.md	Update overwrite and scorer in API docs (#9384 )	2021-10-11 10:35:07 +02:00
sentencizer.md	Update overwrite and scorer in API docs (#9384 )	2021-10-11 10:35:07 +02:00
span.md	Add SpanRuler component (#9880 )	2022-06-02 13:12:53 +02:00
spancategorizer.md	Update default spans_key to sc in API docs (#10616 )	2022-04-04 18:09:15 +02:00
spangroup.md	Override SpanGroups.setdefault to provide default SpanGroup (#10772 )	2022-05-12 10:06:25 +02:00
spanruler.md	Add SpanRuler component (#9880 )	2022-06-02 13:12:53 +02:00
stringstore.md	Fix misspelt keyword in StringStore example	2022-05-29 10:49:19 +01:00
tagger.md	Document Tagger neg_prefix, fix typo (#9821 )	2021-12-07 09:42:40 +01:00
textcategorizer.md	Fix Scorer.score_cats for missing labels (#9443 )	2021-12-29 11:04:39 +01:00
tok2vec.md	Tidy up docs	2021-06-28 12:08:15 +02:00
token.md	Token sent attributes more consistent (#10164 )	2022-02-08 08:35:37 +01:00
tokenizer.md	Add tokenizer option to allow Matcher handling for all rules (#10452 )	2022-03-24 13:21:32 +01:00
top-level.md	Update documentation for displacy style kwargs (#10841 )	2022-05-30 09:11:55 +02:00
transformer.md	Update docs for spacy-transformers v1.1 data classes (#9361 )	2021-10-18 14:16:58 +02:00
vectors.md	Docs for v3.3 (#10628 )	2022-04-28 14:09:35 +02:00
vocab.md	Add vector deduplication (#10551 )	2022-03-30 08:54:23 +02:00