spaCy/spacy
Matthew Honnibal 72e4d3782a
Resize doc.tensor when merging spans. Closes #1963 (#3106)
The doc.retokenize() context manager wasn't resizing doc.tensor, leading to a mismatch between the number of tokens in the doc and the number of rows in the tensor. We fix this by deleting rows from the tensor. Merged spans are represented by the vector of their last token.

* Add test for resizing doc.tensor when merging

* Add test for resizing doc.tensor when merging. Closes #1963

* Update get_lca_matrix test for develop

* Fix retokenize if tensor unset
2018-12-30 15:17:17 +01:00
..
cli spacy.cli.evaluate: fix TypeError (#3101) 2018-12-28 11:14:28 +01:00
data Make spacy/data a package 2017-03-18 20:04:22 +01:00
displacy Small fixes to displaCy (#3076) 2018-12-20 17:32:04 +01:00
lang Prevent exceptions from setting POS but not TAG. Closes #1773 2018-12-30 13:16:05 +01:00
syntax 💫 Prevent parser from predicting unseen classes (#3075) 2018-12-20 16:12:22 +01:00
tests Resize doc.tensor when merging spans. Closes #1963 (#3106) 2018-12-30 15:17:17 +01:00
tokens Resize doc.tensor when merging spans. Closes #1963 (#3106) 2018-12-30 15:17:17 +01:00
__init__.pxd * Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags. 2014-10-24 02:23:42 +11:00
__init__.py Tidy up and format remaining files 2018-11-30 17:43:08 +01:00
__main__.py 💫 New JSON helpers, training data internals & CLI rewrite (#2932) 2018-11-30 20:16:14 +01:00
_align.pyx Improve alignment around quotes 2018-08-16 01:04:34 +02:00
_ml.py 💫 Better support for semi-supervised learning (#3035) 2018-12-10 16:25:33 +01:00
about.py Set version to v2.1.0a5 2018-12-21 00:26:39 +01:00
attrs.pxd Fix LANG symbol 2018-02-17 18:10:50 +01:00
attrs.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
compat.py 💫 Replace ujson, msgpack and dill/pickle/cloudpickle with srsly (#3003) 2018-12-03 01:28:22 +01:00
errors.py Small fixes to displaCy (#3076) 2018-12-20 17:32:04 +01:00
glossary.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
gold.pxd Add support for sent_start to GoldParse 2017-08-25 20:03:14 -05:00
gold.pyx Fix JSON segmentation bug that affected French 2018-12-08 10:41:24 +01:00
language.py Improve entry points and allow custom language classes via entry points (#3080) 2018-12-20 23:58:43 +01:00
lemmatizer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
lexeme.pxd WIP on stringstore change. 27 failures 2017-05-28 14:06:40 +02:00
lexeme.pyx 💫 Add .similarity warnings for no vectors and option to exclude warnings (#2197) 2018-05-21 01:22:38 +02:00
matcher.pyx Fix behaviour of Matcher's ? quantifier for v2.1 (#3105) 2018-12-29 16:18:09 +01:00
morphology.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
morphology.pyx Fix lemmatization 2018-07-05 13:56:02 +02:00
parts_of_speech.pxd Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
parts_of_speech.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
pipeline.pxd Fix names of pipeline components 2017-10-26 12:38:23 +02:00
pipeline.pyx 💫 Raise better error when using uninitialized pipeline component (#3074) 2018-12-20 15:54:53 +01:00
scorer.py 💫 Tidy up and auto-format .py files (#2983) 2018-11-30 17:03:03 +01:00
strings.pxd Try to fix StringStore clean up (see #1506) 2017-11-11 03:11:27 +03:00
strings.pyx Add get_string_id helper to spacy.strings 2018-12-10 16:09:26 +01:00
structs.pxd Make NORM a token attribute (#3029) 2018-12-08 10:49:10 +01:00
symbols.pxd Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
symbols.pyx Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop" 2018-03-27 19:23:02 +02:00
tokenizer.pxd Disable tokenizer cache for special-cases. Fixes #1250 2017-10-24 16:08:05 +02:00
tokenizer.pyx 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
typedefs.pxd Work on changing StringStore to return hashes. 2017-05-28 12:36:27 +02:00
typedefs.pyx Tidy up rest 2017-10-27 21:07:59 +02:00
util.py Improve entry points and allow custom language classes via entry points (#3080) 2018-12-20 23:58:43 +01:00
vectors.pyx Fix KeyError in Vectors.most_similar. Fixes #2648 2018-12-10 16:19:18 +01:00
vocab.pxd 💫 Small efficiency fixes to tokenizer (#2587) 2018-07-24 23:35:54 +02:00
vocab.pyx Prevent exceptions from setting POS but not TAG. Closes #1773 2018-12-30 13:16:05 +01:00