spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 10:56:45 +03:00

History

Adriane Boyd 45c9a68828 Identify final Matcher pattern node by quantifier (#6317 ) Modify the internal pattern representation in `Matcher` patterns to identify the final ID state using a unique quantifier rather than a combination of other attributes. It was insufficient to identify the final ID node based on an uninitialized `quantifier` (coincidentally being the same as the `ZERO`) with `nr_attr` as 0. (In addition, it was potentially bug-prone that `nr_attr` was set to 0 even though attrs were allocated.) In the case of `{"OP": "!"}` (a valid, if pointless, pattern), `nr_attr` is 0 and the quantifier is ZERO, so the previous methods for incrementing to the ID node at the end of the pattern weren't able to distinguish the final ID node from the `{"OP": "!"}` pattern.		2020-10-31 12:18:48 +01:00
..
cli	add reenabled pipe names back to the meta before serializing (#6219 )	2020-10-08 00:44:16 +02:00
data	Make spacy/data a package	2017-03-18 20:04:22 +01:00
displacy	Fix on EntityRendered to support break lines (after last entity) (closes #5838 )	2020-07-29 18:48:39 +02:00
lang	Turkish tokenization improvements (#6268 )	2020-10-29 09:43:17 +01:00
matcher	Identify final Matcher pattern node by quantifier (#6317 )	2020-10-31 12:18:48 +01:00
ml	Reproducibility for TextCat and Tok2Vec (#6218 )	2020-10-08 00:43:46 +02:00
pipeline	Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct (#5910 )	2020-08-12 15:36:14 +02:00
syntax	Improve warnings around normalization tables (#5794 )	2020-07-22 16:04:58 +02:00
tests	Identify final Matcher pattern node by quantifier (#6317 )	2020-10-31 12:18:48 +01:00
tokens	Fix/span.sent (#6083 )	2020-10-01 14:01:52 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Simplify warnings	2020-04-28 13:37:37 +02:00
__main__.py	Use latest wasabi	2019-11-04 02:38:45 +01:00
_ml.py	Reproducibility for TextCat and Tok2Vec (#6218 )	2020-10-08 00:43:46 +02:00
about.py	Set version to 2.3.2 (#5756 )	2020-07-13 14:55:56 +02:00
analysis.py	Simplify warnings	2020-04-28 13:37:37 +02:00
attrs.pxd	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
attrs.pyx	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
compat.py	Replace function registries with catalogue (#4584 )	2019-11-07 11:45:22 +01:00
errors.py	Add warning when Matcher subpattern is discarded (#5873 )	2020-08-05 14:56:14 +02:00
glossary.py	Update tag maps and docs for English and German (#4501 )	2019-10-24 12:56:05 +02:00
gold.pxd	Merge changes from master	2019-08-21 14:18:52 +02:00
gold.pyx	Updates to docstrings (#5589 )	2020-06-15 14:58:36 +02:00
kb.pxd	Tidy up and avoid absolute spacy imports in core	2020-05-21 20:05:03 +02:00
kb.pyx	Merge pull request #5264 from lfiedler/issue-5230	2020-05-22 00:31:07 +02:00
language.py	Change type of texts argument in pipe to iterable (#6186 )	2020-10-02 21:00:11 +02:00
lemmatizer.py	Fix lemmatizer is_base_form for python2.7 (#5734 )	2020-07-09 22:11:24 +02:00
lexeme.pxd	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
lexeme.pyx	Fix polarity of Token.is_oov and Lexeme.is_oov (#5634 )	2020-06-23 13:29:51 +02:00
lookups.py	Reduce memory usage of Lookup's BloomFilter (#5606 )	2020-06-26 14:09:10 +02:00
morphology.pxd	annotate kb_id through ents in doc	2019-03-22 11:36:44 +01:00
morphology.pyx	Improve tag map initialization and updating (#5768 )	2020-07-19 11:13:39 +02:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
scorer.py	Fix GoldParse init when token count differs (#5191 )	2020-03-26 10:46:23 +01:00
strings.pxd	Try to fix StringStore clean up (see #1506 )	2017-11-11 03:11:27 +03:00
strings.pyx	Merge branch 'master' into feature/lemmatizer	2019-03-16 13:44:22 +01:00
structs.pxd	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
symbols.pxd	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
symbols.pyx	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
tokenizer.pxd	Rename to url_match	2020-05-22 12:41:03 +02:00
tokenizer.pyx	Rename to url_match	2020-05-22 12:41:03 +02:00
typedefs.pxd	Work on changing StringStore to return hashes.	2017-05-28 12:36:27 +02:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Fix Issue 6207 (#6208 )	2020-10-06 11:17:37 +02:00
vectors.pyx	fix deserialization order	2020-05-30 12:53:32 +02:00
vocab.pxd	Reduce stored lexemes data, move feats to lookups (#5238 )	2020-05-19 15:59:14 +02:00
vocab.pyx	Updates to docstrings (#5589 )	2020-06-15 14:58:36 +02:00