spaCy/spacy
Matthew Honnibal f277bfdf0f
Add SpanGroup and Graph container types to represent arbitrary annotations (#6696)
* Draft out initial Spans data structure

* Initial span group commit

* Basic span group support on Doc

* Basic test for span group

* Compile span_group.pyx

* Draft addition of SpanGroup to DocBin

* Add deserialization for SpanGroup

* Add tests for serializing SpanGroup

* Fix serialization of SpanGroup

* Add EdgeC and GraphC structs

* Add draft Graph data structure

* Compile graph

* More work on Graph

* Update GraphC

* Upd graph

* Fix walk functions

* Let Graph take nodes and edges on construction

* Fix walking and getting

* Add graph tests

* Fix import

* Add module with the SpanGroups dict thingy

* Update test

* Rename 'span_groups' attribute

* Try to fix c++11 compilation

* Fix test

* Update DocBin

* Try to fix compilation

* Try to fix graph

* Improve SpanGroup docstrings

* Add doc.spans to documentation

* Fix serialization

* Tidy up and add docs

* Update docs [ci skip]

* Add SpanGroup.has_overlap

* WIP updated Graph API

* Start testing new Graph API

* Update Graph tests

* Update Graph

* Add docstring

Co-authored-by: Ines Montani <ines@ines.io>
2021-01-14 17:30:41 +11:00
..
cli Fix types of Tok2Vec encoding architectures (#6442) 2021-01-07 16:39:27 +11:00
displacy Refactor Docs.is_ flags (#6044) 2020-09-17 00:14:01 +02:00
lang Update model-related dependencies (#6725) 2021-01-14 17:29:44 +11:00
matcher Add SPACY as a Matcher attribute (#6463) 2020-11-30 09:34:50 +08:00
ml Fix types of Tok2Vec encoding architectures (#6442) 2021-01-07 16:39:27 +11:00
pipeline Sync missing and misaligned values in Tagger loss (#6689) 2021-01-10 11:30:37 +11:00
tests Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
tokens Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
training Fix train loop to avoid swallowing tracebacks (#6693) 2021-01-09 08:25:47 +08:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py require_cpu functionality (#6336) 2020-12-08 14:42:40 +08:00
__main__.py Tidy up 2020-06-22 00:45:40 +02:00
about.py Set version to v3.0.0rc3 2020-11-03 17:29:57 +01:00
attrs.pxd Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
attrs.pyx Merge branch 'develop' into master-tmp 2020-05-21 18:39:06 +02:00
compat.py Use Literal type for nr_feature_tokens 2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg pretrain architectures (#6451) 2020-12-08 14:41:03 +08:00
default_config.cfg Add initialize.before_init and after_init callbacks 2021-01-12 13:07:44 +01:00
errors.py multi-label textcat component (#6474) 2021-01-06 13:07:14 +11:00
glossary.py unicode -> str consistency 2020-05-24 17:20:58 +02:00
kb.pxd Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
kb.pyx Revert added_strings change (#6236) 2020-10-10 18:55:07 +02:00
language.py Add initialize.before_init and after_init callbacks 2021-01-12 13:07:44 +01:00
lexeme.pxd Fix Lexeme.from_ptr 2020-08-10 16:43:37 +02:00
lexeme.pyx Update docs links in codebase 2020-09-04 12:58:50 +02:00
lookups.py Always serialize lookups and vectors to disk 2020-10-05 09:40:20 +02:00
morphology.pxd Add Lemmatizer and simplify related components (#5848) 2020-08-07 15:27:13 +02:00
morphology.pyx Prevent 0-length mem alloc (#6653) 2021-01-06 12:50:17 +11:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx
pipe_analysis.py Tidy up and auto-format 2020-09-29 21:39:28 +02:00
schemas.py Add initialize.before_init and after_init callbacks 2021-01-12 13:07:44 +01:00
scorer.py multi-label textcat component (#6474) 2021-01-06 13:07:14 +11:00
strings.pxd Remove 'cleanup' of strings (#6007) 2020-09-01 16:12:15 +02:00
strings.pyx Update docs links in codebase 2020-09-04 12:58:50 +02:00
structs.pxd Add SpanGroup and Graph container types to represent arbitrary annotations (#6696) 2021-01-14 17:30:41 +11:00
symbols.pxd Add _ as a symbol (#6153) 2020-09-27 22:20:14 +02:00
symbols.pyx Add _ as a symbol (#6153) 2020-09-27 22:20:14 +02:00
tokenizer.pxd Simplify specials and cache checks (#6012) 2020-09-03 09:42:49 +02:00
tokenizer.pyx Use special matcher for exceptions with spaces (#6668) 2021-01-06 12:05:10 +08:00
typedefs.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Tidy up and auto-format 2021-01-05 13:41:53 +11:00
vectors.pyx Update docs links in codebase 2020-09-04 12:58:50 +02:00
vocab.pxd Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master 2020-11-25 11:49:34 +01:00
vocab.pyx Prevent 0-length mem alloc (#6653) 2021-01-06 12:50:17 +11:00