spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-08 09:41:11 +03:00

History

Adriane Boyd f94168a41e Backport bugfixes from v3.1.0 to v3.0 (#8739 ) * Fix scoring normalization (#7629) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup * Use a context manager when reading model (fix #7036) (#8244) * Fix other open calls without context managers (#8245) * Don't add duplicate patterns all the time in EntityRuler (fix #8216) (#8246) * Don't add duplicate patterns (fix #8216) * Refactor EntityRuler init This simplifies the EntityRuler init code. This is helpful as prep for allowing the EntityRuler to reset itself. * Make EntityRuler.clear reset matchers Includes a new test for this. * Tidy PhraseMatcher instantiation Since the attr can be None safely now, the guard if is no longer required here. Also renamed the `_validate` attr. Maybe it's not needed? * Fix NER test * Add test to make sure patterns aren't increasing * Move test to regression tests * Exclude generated .cpp files from package (#8271) * Fix non-deterministic deduplication in Greek lemmatizer (#8421) * Fix setting empty entities in Example.from_dict (#8426) * Filter W036 for entity ruler, etc. (#8424) * Preserve paths.vectors/initialize.vectors setting in quickstart template * Various fixes for spans in Docs.from_docs (#8487) * Fix spans offsets if a doc ends in a single space and no space is inserted * Also include spans key in merged doc for empty spans lists * Fix duplicate spacy package CLI opts (#8551) Use `-c` for `--code` and not additionally for `--create-meta`, in line with the docs. * Raise an error for textcat with <2 labels (#8584) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs * Add Macedonian models to website (#8637) * Fix Azerbaijani init, extend lang init tests (#8656) * Extend langs in initialize tests * Fix az init * Fix ru/uk lemmatizer mp with spawn (#8657) Use an instance variable instead a class variable for the morphological analzyer so that multiprocessing with spawn is possible. * Use 0-vector for OOV lexemes (#8639) * Set version to v3.0.7 Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>		2021-07-19 09:20:40 +02:00
..
cli	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
displacy	Also exclude user hooks in displacy conversion (#7419 )	2021-03-12 09:41:59 +01:00
lang	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
matcher	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
ml	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
pipeline	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
tests	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
tokens	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
training	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
__init__.pxd	* Seems to be working after refactor. Need to wire up more POS tag features, and wire up save/load of POS tags.	2014-10-24 02:23:42 +11:00
__init__.py	Show warning if entity_ruler runs without patterns (#7807 )	2021-05-31 18:20:27 +10:00
__main__.py	Tidy up	2020-06-22 00:45:40 +02:00
about.py	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
attrs.pxd	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
attrs.pyx	Remove unsupported attrs from attrs.IDS (#8132 )	2021-06-02 19:16:57 +10:00
compat.py	Use Literal type for nr_feature_tokens	2020-09-23 16:00:03 +02:00
default_config_pretraining.cfg	pretrain architectures (#6451 )	2020-12-08 14:41:03 +08:00
default_config.cfg	Support large/infinite training corpora (#7208 )	2021-04-08 18:08:04 +10:00
errors.py	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
glossary.py	Add Chinese PTB tags to glossary (#7993 )	2021-05-06 18:43:03 +10:00
kb.pxd	Revert added_strings change (#6236 )	2020-10-10 18:55:07 +02:00
kb.pyx	KB & NEL to/from bytes (#8113 )	2021-05-20 18:11:30 +10:00
language.py	Address missing config overrides post load of models (#8208 )	2021-05-31 18:36:52 +10:00
lexeme.pxd	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
lexeme.pyx	fix 's typo's across code base (#8384 )	2021-06-15 10:57:08 +02:00
lookups.py	Update load_lookups return type and docstring (#7907 )	2021-04-27 09:13:39 +02:00
morphology.pxd	Add Lemmatizer and simplify related components (#5848 )	2020-08-07 15:27:13 +02:00
morphology.pyx	Prevent 0-length mem alloc (#6653 )	2021-01-06 12:50:17 +11:00
parts_of_speech.pxd	Add support for Universal Dependencies v2.0	2017-03-03 13:17:34 +01:00
parts_of_speech.pyx	Drop Python 2.7 and 3.5 (#4828 )	2019-12-22 01:53:56 +01:00
pipe_analysis.py	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
py.typed	Add py.typed	2021-03-16 09:48:31 +01:00
schemas.py	Support env vars and CLI overrides for project.yml	2021-02-10 13:45:27 +11:00
scorer.py	Extend score_spans for overlapping & non-labeled spans (#7209 )	2021-04-08 12:19:17 +02:00
strings.pxd	Remove 'cleanup' of strings (#6007 )	2020-09-01 16:12:15 +02:00
strings.pyx	Make vocab update in get_docs deterministic (#7603 )	2021-04-09 11:53:13 +02:00
structs.pxd	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 )	2021-01-14 17:30:41 +11:00
symbols.pxd	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
symbols.pyx	introduce token.has_head and refer to MISSING_DEP_ (WIP)	2021-01-12 17:17:06 +01:00
tokenizer.pxd	Fix tokenizer cache flushing (#7836 )	2021-04-22 18:14:57 +10:00
tokenizer.pyx	Fix tokenizer cache flushing (#7836 )	2021-04-22 18:14:57 +10:00
typedefs.pxd	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master	2020-11-25 11:49:34 +01:00
typedefs.pyx	Tidy up rest	2017-10-27 21:07:59 +02:00
util.py	Backport bugfixes from v3.1.0 to v3.0 (#8739 )	2021-07-19 09:20:40 +02:00
vectors.pyx	Fix vectors data on GPU (#7626 )	2021-04-19 18:30:03 +10:00
vocab.pxd	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master	2020-11-25 11:49:34 +01:00
vocab.pyx	Skip vector ngram backoff if minn is not set (#7925 )	2021-05-06 18:34:35 +10:00