spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-28 02:46:35 +03:00

Author	SHA1	Message	Date
Adriane Boyd	0fe43f40f1	Support registered vectors (#12492 ) * Support registered vectors * Format * Auto-fill [nlp] on load from config and from bytes/disk * Only auto-fill [nlp] * Undo all changes to Language.from_disk * Expand BaseVectors These methods are needed in various places for training and vector similarity. * isort * More linting * Only fill [nlp.vectors] * Update spacy/vocab.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Revert changes to test related to auto-filling [nlp] * Add vectors registry * Rephrase error about vocab methods for vectors * Switch to dummy implementation for BaseVectors.to_ops * Add initial draft of docs * Remove example from BaseVectors docs * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/api/basevectors.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix type and lint bpemb example * Update website/docs/api/basevectors.mdx --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-08-01 15:46:08 +02:00
Sofie Van Landeghem	b1b20bf69d	Replace projects functionality with weasel (#12769 ) * Setting up weasel branch (#12456) * remove project-specific functionality * remove project-specific tests * remove project-specific schemas * remove project-specific information in about * remove project-specific functions in util.py * remove project-specific error strings * remove project-specific CLI commands * black formatting * restore some functions that are used beyond projects * remove project imports * remove imports * remove remote_storage tests * remove one more project unit test * update for PR 12394 * remove get_hash and get_checksum * remove upload_ and download_file methods * remove ensure_pathy * revert clumsy fingers * reinstate E970 * feat: use weasel as spacy project command (#12473) * feat: use weasel as spacy project command * build: use constrained requirement for weasel * feat: add weasel to the library requirements * build: update weasel to new version * build: use specific weasel tag * build: use weasel-0.1.0rc1 from PyPI * fix: remove weasel from requirements.txt * fix: requirements.txt and setup.cfg need to reflect each other * feat: remove legacy spacy project code * bump version * further merge fixes * isort --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-07-07 09:10:27 +02:00
Adriane Boyd	65f6c9cd10	Support overriding registered functions in configs (#12623 ) Support overriding registered functions in configs. Previously the registry name was parsed as a section name rather than as a registry name.	2023-06-27 17:36:33 +02:00
Daniël de Kok	e2b70df012	Configure isort to use the Black profile, recursively isort the `spacy` module (#12721 ) * Use isort with Black profile * isort all the things * Fix import cycles as a result of import sorting * Add DOCBIN_ALL_ATTRS type definition * Add isort to requirements * Remove isort from build dependencies check * Typo	2023-06-14 17:48:41 +02:00
Paul O'Leary McCann	1e8bac99f3	Add tests for projects to master (#12303 ) * Add tests for projects to master * Fix git clone related issues on Windows * Add stat import	2023-02-23 10:22:57 +01:00
Adriane Boyd	80bc140533	Add grc to langs with lexeme norms in spacy-lookups-data (#12287 )	2023-02-16 17:57:02 +01:00
Edward	61b8454137	Adjust return type of `registry.find` (#12227 ) * Fix registry find return type * add dot * Add type ignore for mypy * update black formatting version * add mypy ignore to package cli * mypy type fix (for real) * Update find description in spacy/util.py Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * adjust mypy directive --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-02-15 12:32:53 +01:00
Zhangrp	eb8bb35c13	improve ux for displacy when the serve port is in use (#11948 ) * check port in use and add itself * check port in use and add itself * Auto switch to nearest available port. * Use bind to check port instead of connect_ex. * Reformat. * Add auto_select_port argument. * update docs for displacy.serve * Update spacy/errors.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update website/docs/api/top-level.md Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/errors.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Add test using multiprocessing * fix argument name * Increase sleep times Want to rule this out as a cause of test failure * Don't terminate a process that isn't alive * Refactor port finding logic This moves all the port logic into its own util function, which can be tested without having to background a server directly. * Use with for the server This ensures the server is closed correctly. * Pass in the host when checking port availability * Shorten argument name * Update error codes following merge * Add types for arguments, specify docstrings. * Add typing for arguments with default value. * Update docstring to match spaCy format. * Update docstring to match spaCy format. * Fix docs Arg name changed from `auto_select_port` to just `auto_select`. * Revert "Fix docs" This reverts commit `356966fe84`. Co-authored-by: zhiiw <1302593554@qq.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-01-10 15:52:57 +09:00
github-actions[bot]	df0cb4b77b	Auto-format code with black (#11913 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-12-02 14:49:12 +01:00
Adriane Boyd	6f9d630f7e	Replace Pipe type with Callable in Language (#11803 ) * Replace Pipe type with Callable in Language * Use Callable[[Doc], Doc] in the docstrings	2022-11-29 13:20:08 +01:00
Paul O'Leary McCann	db56600536	Fix default parameters for load functions (fix #11706 ) (#11713 ) * Fix default parameters for load functions Some load functions used SimpleFrozenList() directly instead of the _DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but the changes in #11459 check for equality using identity, not value, so a warning is incorrectly raised sometimes, as in #11706. This change just has all the load functions use the singleton value instead. * Add test that there are no warnings on module-based load This will succeed due to changes in this branch, but local tests with the latest release failed as intended. * Try reverting commit and see if CI changes There is an error in CI that is probably unrelated. Revert "Fix default parameters for load functions" This reverts commit `dc46b35687`. * Revert "Try reverting commit and see if CI changes" This reverts commit `2514ed07ef`. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-03 10:52:59 +01:00
Raphael Mitsch	aea16719be	Simplify and clarify enable/disable behavior of spacy.load() (#11459 ) * Change enable/disable behavior so that arguments take precedence over config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments. * Fix tests in test_serialize_pipeline.py to reflect changes to handling of enable/disable. * Fix type issue. * Move comment. * Move comment. * Issue UserWarning instead of printing wasabi message. Adjust test. * Added pytest.warns(UserWarning) for expected warning to fix tests. * Update warning message. * Move type handling out of fetch_pipes_status(). * Add global variable for default value. Use id() to determine whether used values are default value. * Fix default value for disable. * Rename DEFAULT_PIPE_STATUS to _DEFAULT_EMPTY_PIPES.	2022-09-27 14:22:36 +02:00
Sofie Van Landeghem	8fc0efc502	Allow string argument for disable/enable/exclude (#11406 ) * adding unit test for spacy.load with disable/exclude string arg * allow pure strings in from_config * update docs * upstream type adjustements * docs update * make docstring more consistent * Update spacy/language.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * two more cleanups * fix type in internal method Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-31 09:02:34 +02:00
Adriane Boyd	b07708d5d0	Support full prerelease versions in the compat table (#11228 ) * Support full prerelease versions in the compat table * Fix types	2022-08-04 15:14:19 +02:00
Madeesh Kannan	8f1ba4de58	Backport parser/alignment optimizations from `feature/refactor-parser` (#10952 )	2022-06-24 13:39:52 +02:00
Raphael Mitsch	4c058eb40a	`enable` argument for spacy.load() (#10784 ) * Enable flag on spacy.load: foundation for include, enable arguments. * Enable flag on spacy.load: fixed tests. * Enable flag on spacy.load: switched from pretrained model to empty model with added pipes for tests. * Enable flag on spacy.load: switched to more consistent error on misspecification of component activity. Test refactoring. Added to default config. * Enable flag on spacy.load: added support for fields not in pipeline. * Enable flag on spacy.load: removed serialization fields from supported fields. * Enable flag on spacy.load: removed 'enable' from config again. * Enable flag on spacy.load: relaxed checks in _resolve_component_activation_status() to allow non-standard pipes. * Enable flag on spacy.load: fixed relaxed checks for _resolve_component_activation_status() to allow non-standard pipes. Extended tests. * Enable flag on spacy.load: comments w.r.t. resolution workarounds. * Enable flag on spacy.load: remove include fields. Update website docs. * Enable flag on spacy.load: updates w.r.t. changes in master. * Implement Doc.from_json(): update docstrings. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): remove newline. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Implement Doc.from_json(): change error message for E1038. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Enable flag on spacy.load: wrapped docstring for _resolve_component_status() at 80 chars. * Enable flag on spacy.load: changed exmples for enable flag. * Remove newline. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix docstring for Language._resolve_component_status(). * Rename E1038 to E1042. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-06-17 20:24:13 +01:00
Adriane Boyd	a322d6d5f2	Add SpanRuler component (#9880 ) * Add SpanRuler component Add a `SpanRuler` component similar to `EntityRuler` that saves a list of matched spans to `Doc.spans[spans_key]`. The matches from the token and phrase matchers are deduplicated and sorted before assignment but are not otherwise filtered. * Update spacy/pipeline/span_ruler.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix cast * Add self.key property * Use number of patterns as length * Remove patterns kwarg from init * Update spacy/tests/pipeline/test_span_ruler.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Add options for spans filter and setting to ents * Add `spans_filter` option as a registered function' * Make `spans_key` optional and if `None`, set to `doc.ents` instead of `doc.spans[spans_key]`. * Update and generalize tests * Add test for setting doc.ents, fix key property type * Fix typing * Allow independent doc.spans and doc.ents * If `spans_key` is set, set `doc.spans` with `spans_filter`. * If `annotate_ents` is set, set `doc.ents` with `ents_fitler`. * Use `util.filter_spans` by default as `ents_filter`. * Use a custom warning if the filter does not work for `doc.ents`. * Enable use of SpanC.id in Span * Support id in SpanRuler as Span.id * Update types * `id` can only be provided as string (already by `PatternType` definition) * Update all uses of Span.id/ent_id in Doc * Rename Span id kwarg to span_id * Update types and docs * Add ents filter to mimic EntityRuler overwrite_ents * Refactor `ents_filter` to take `entities, spans` args for more filtering options * Give registered filters more descriptive names * Allow registered `filter_spans` filter (`spacy.first_longest_spans_filter.v1`) to take any number of `Iterable[Span]` objects as args so it can be used for spans filter or ents filter * Implement future entity ruler as span ruler Implement a compatible `entity_ruler` as `future_entity_ruler` using `SpanRuler` as the underlying component: * Add `sort_key` and `sort_reverse` to allow the sorting behavior to be customized. (Necessary for the same sorting/filtering as in `EntityRuler`.) * Implement `overwrite_overlapping_ents_filter` and `preserve_existing_ents_filter` to support `EntityRuler.overwrite_ents` settings. * Add `remove_by_id` to support `EntityRuler.remove` functionality. * Refactor `entity_ruler` tests to parametrize all tests to test both `entity_ruler` and `future_entity_ruler` * Implement `SpanRuler.token_patterns` and `SpanRuler.phrase_patterns` properties. Additional changes: * Move all config settings to top-level attributes to avoid duplicating settings in the config vs. `span_ruler/cfg`. (Also avoids a lot of casting.) * Format * Fix filter make method name * Refactor to use same error for removing by label or ID * Also provide existing spans to spans filter * Support ids property * Remove token_patterns and phrase_patterns * Update docstrings * Add span ruler docs * Fix types * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Move sorting into filters * Check for all tokens in seen tokens in entity ruler filters * Remove registered sort key * Set Token.ent_id in a backwards-compatible way in Doc.set_ents * Remove sort options from API docs * Update docstrings * Rename entity ruler filters * Fix and parameterize scoring * Add id to Span API docs * Fix typo in API docs * Include explicit labeled=True for scorer Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-06-02 13:12:53 +02:00
Richard Hudson	32954c3bcb	Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786 ) * Make changes to typing * Correction * Format with black * Corrections based on review * Bumped Thinc dependency version * Bumped blis requirement * Correction for older Python versions * Update spacy/ml/models/textcat.py Co-authored-by: Daniël de Kok <me@github.danieldk.eu> * Corrections based on review feedback * Readd deleted docstring line Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2022-05-25 09:33:54 +02:00
Sofie Van Landeghem	d89dac4066	hook up meta in load_model_from_config (#10400 )	2022-03-04 11:07:45 +01:00
Adriane Boyd	a9ee5bff98	Support mixed case model package names (#10223 )	2022-02-08 10:52:46 +01:00
Sofie Van Landeghem	56dcb39fb7	Fix references to config file in the docs & UX (#9961 ) * doc fixes around config file * fix typo * clarify default	2022-01-04 14:31:26 +01:00
Adriane Boyd	e6f91b6f27	Format (#9630 )	2021-11-05 09:56:26 +01:00
Adriane Boyd	a803af9dfa	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
Connor Brinton	657af5f91f	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 ) * 🚨 Ignore all existing Mypy errors * 🏗 Add Mypy check to CI * Add types-mock and types-requests as dev requirements * Add additional type ignore directives * Add types packages to dev-only list in reqs test * Add types-dataclasses for python 3.6 * Add ignore to pretrain * 🏷 Improve type annotation on `run_command` helper The `run_command` helper previously declared that it returned an `Optional[subprocess.CompletedProcess]`, but it isn't actually possible for the function to return `None`. These changes modify the type annotation of the `run_command` helper and remove all now-unnecessary `# type: ignore` directives. * 🔧 Allow variable type redefinition in limited contexts These changes modify how Mypy is configured to allow variables to have their type automatically redefined under certain conditions. The Mypy documentation contains the following example: ```python def process(items: List[str]) -> None: # 'items' has type List[str] items = [item.split() for item in items] # 'items' now has type List[List[str]] ... ``` This configuration change is especially helpful in reducing the number of `# type: ignore` directives needed to handle the common pattern of: * Accepting a filepath as a string * Overwriting the variable using `filepath = ensure_path(filepath)` These changes enable redefinition and remove all `# type: ignore` directives rendered redundant by this change. * 🏷 Add type annotation to converters mapping * 🚨 Fix Mypy error in convert CLI argument verification * 🏷 Improve type annotation on `resolve_dot_names` helper * 🏷 Add type annotations for `Vocab` attributes `strings` and `vectors` * 🏷 Add type annotations for more `Vocab` attributes * 🏷 Add loose type annotation for gold data compilation * 🏷 Improve `_format_labels` type annotation * 🏷 Fix `get_lang_class` type annotation * 🏷 Loosen return type of `Language.evaluate` * 🏷 Don't accept `Scorer` in `handle_scores_per_type` * 🏷 Add `string_to_list` overloads * 🏷 Fix non-Optional command-line options * 🙈 Ignore redefinition of `wandb_logger` in `loggers.py` * ➕ Install `typing_extensions` in Python 3.8+ The `typing_extensions` package states that it should be used when "writing code that must be compatible with multiple Python versions". Since SpaCy needs to support multiple Python versions, it should be used when newer `typing` module members are required. One example of this is `Literal`, which is available starting with Python 3.8. Previously SpaCy tried to import `Literal` from `typing`, falling back to `typing_extensions` if the import failed. However, Mypy doesn't seem to be able to understand what `Literal` means when the initial import means. Therefore, these changes modify how `compat` imports `Literal` by always importing it from `typing_extensions`. These changes also modify how `typing_extensions` is installed, so that it is a requirement for all Python versions, including those greater than or equal to 3.8. * 🏷 Improve type annotation for `Language.pipe` These changes add a missing overload variant to the type signature of `Language.pipe`. Additionally, the type signature is enhanced to allow type checkers to differentiate between the two overload variants based on the `as_tuple` parameter. Fixes #8772 * ➖ Don't install `typing-extensions` in Python 3.8+ After more detailed analysis of how to implement Python version-specific type annotations using SpaCy, it has been determined that by branching on a comparison against `sys.version_info` can be statically analyzed by Mypy well enough to enable us to conditionally use `typing_extensions.Literal`. This means that we no longer need to install `typing_extensions` for Python versions greater than or equal to 3.8! 🎉 These changes revert previous changes installing `typing-extensions` regardless of Python version and modify how we import the `Literal` type to ensure that Mypy treats it properly. * resolve mypy errors for Strict pydantic types * refactor code to avoid missing return statement * fix types of convert CLI command * avoid list-set confustion in debug_data * fix typo and formatting * small fixes to avoid type ignores * fix types in profile CLI command and make it more efficient * type fixes in projects CLI * put one ignore back * type fixes for render * fix render types - the sequel * fix BaseDefault in language definitions * fix type of noun_chunks iterator - yields tuple instead of span * fix types in language-specific modules * 🏷 Expand accepted inputs of `get_string_id` `get_string_id` accepts either a string (in which case it returns its ID) or an ID (in which case it immediately returns the ID). These changes extend the type annotation of `get_string_id` to indicate that it can accept either strings or IDs. * 🏷 Handle override types in `combine_score_weights` The `combine_score_weights` function allows users to pass an `overrides` mapping to override data extracted from the `weights` argument. Since it allows `Optional` dictionary values, the return value may also include `Optional` dictionary values. These changes update the type annotations for `combine_score_weights` to reflect this fact. * 🏷 Fix tokenizer serialization method signatures in `DummyTokenizer` * 🏷 Fix redefinition of `wandb_logger` These changes fix the redefinition of `wandb_logger` by giving a separate name to each `WandbLogger` version. For backwards-compatibility, `spacy.train` still exports `wandb_logger_v3` as `wandb_logger` for now. * more fixes for typing in language * type fixes in model definitions * 🏷 Annotate `_RandomWords.probs` as `NDArray` * 🏷 Annotate `tok2vec` layers to help Mypy * 🐛 Fix `_RandomWords.probs` type annotations for Python 3.6 Also remove an import that I forgot to move to the top of the module 😅 * more fixes for matchers and other pipeline components * quick fix for entity linker * fixing types for spancat, textcat, etc * bugfix for tok2vec * type annotations for scorer * add runtime_checkable for Protocol * type and import fixes in tests * mypy fixes for training utilities * few fixes in util * fix import * 🐵 Remove unused `# type: ignore` directives * 🏷 Annotate `Language._components` * 🏷 Annotate `spacy.pipeline.Pipe` * add doc as property to span.pyi * small fixes and cleanup * explicit type annotations instead of via comment Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: svlandeg <svlandeg@github.com>	2021-10-14 15:21:40 +02:00
Adriane Boyd	d98d525bc8	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.1-3	2021-10-14 09:41:46 +02:00
Paul O'Leary McCann	fd759a881b	Fix inconsistent lemmas (#9405 ) * Add util function to unique lists and preserve order * Use unique function instead of list(set()) list(set()) has the issue that it's not consistent between runs of the Python interpreter, so order can vary. list(set()) calls were left in a few places where they were behind calls to sorted(). I think in this case the calls to list() can be removed, but this commit doesn't do that. * Use the existing pattern for this	2021-10-11 11:38:45 +02:00
Elia Robyn Lake (Robyn Speer)	53b5f245ed	Allow IETF language codes, aliases, and close matches (#9342 ) * use language-matching to allow language code aliases Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * link to "IETF language tags" in docs Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * Make requirements consistent Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * change "two-letter language ID" to "IETF language tag" in language docs Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * use langcodes 3.2 and handle language-tag errors better Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * all unknown language codes are ImportErrors Signed-off-by: Elia Robyn Speer <elia@explosion.ai> Co-authored-by: Elia Robyn Speer <elia@explosion.ai>	2021-10-05 09:52:22 +02:00
Adriane Boyd	03f234b739	Merge remote-tracking branch 'upstream/master' into develop	2021-09-27 09:10:45 +02:00
Adriane Boyd	aba6ce3a43	Handle spacy-legacy in package CLI for dependencies (#9163 ) * Handle spacy-legacy in package CLI for dependencies * Implement legacy backoff in spacy registry.find * Remove unused import * Update and format test	2021-09-08 11:46:40 +02:00
Ines Montani	d94ddd5686	Auto-detect package dependencies in spacy package (#8948 ) * Auto-detect package dependencies in spacy package * Add simple get_third_party_dependencies test * Import packages_distributions explicitly * Inline packages_distributions * Fix docstring [ci skip] * Relax catalogue requirement * Move importlib_metadata to spacy.compat with note * Include license information [ci skip]	2021-08-17 14:05:13 +02:00
Adriane Boyd	f99d6d5e39	Refactor scoring methods to use registered functions (#8766 ) * Add scorer option to components Add an optional `scorer` parameter to all pipeline components. If a scoring function is provided, it overrides the default scoring method for that component. * Add registered scorers for all components * Add `scorers` registry * Move all scoring methods outside of components as independent functions and register * Use the registered scoring methods as defaults in configs and inits Additional: * The scoring methods no longer have access to the full component, so use settings from `cfg` as default scorer options to handle settings such as `labels`, `threshold`, and `positive_label` * The `attribute_ruler` scoring method no longer has access to the patterns, so all scoring methods are called * Bug fix: `spancat` scoring method is updated to set `allow_overlap` to score overlapping spans correctly * Update Russian lemmatizer to use direct score method * Check type of cfg in Pipe.score * Fix check * Update spacy/pipeline/sentencizer.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Remove validate_examples from scoring functions * Use Pipe.labels instead of Pipe.cfg["labels"] Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-10 15:13:39 +02:00
Adriane Boyd	5eeb25f043	Tidy up code	2021-06-28 12:08:15 +02:00
Matthew Honnibal	f9946154d9	Add SpanCategorizer component (#6747 ) * Draft spancat model * Add spancat model * Add test for extract_spans * Add extract_spans layer * Upd extract_spans * Add spancat model * Add test for spancat model * Upd spancat model * Update spancat component * Upd spancat * Update spancat model * Add quick spancat test * Import SpanCategorizer * Fix SpanCategorizer component * Import SpanGroup * Fix span extraction * Fix import * Fix import * Upd model * Update spancat models * Add scoring, update defaults * Update and add docs * Fix type * Update spacy/ml/extract_spans.py * Auto-format and fix import * Fix comment * Fix type * Fix type * Update website/docs/api/spancategorizer.md * Fix comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Better defense Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix labels list Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/extract_spans.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/pipeline/spancat.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Set annotations during update * Set annotations in spancat * fix imports in test * Update spacy/pipeline/spancat.py * replace MaxoutLogistic with LinearLogistic * fix config * various small fixes * remove set_annotations parameter in update * use our beloved tupley format with recent support for doc.spans * bugfix to allow renaming the default span_key (scores weren't showing up) * use different key in docs example * change defaults to better-working parameters from project (WIP) * register spacy.extract_spans.v1 for legacy purposes * Upd dev version so can build wheel * layers instead of architectures for smaller building blocks * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Include additional scores from overrides in combined score weights * Parameterize spans key in scoring Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so that it's possible to evaluate multiple `spancat` components in the same pipeline. * Use the (intentionally very short) default spans key `sc` in the `SpanCategorizer` * Adjust the default score weights to include the default key * Adjust the scorer to use `spans_{spans_key}` as the prefix for the returned score * Revert addition of `attr_name` argument to `score_spans` and adjust the key in the `getter` instead. Note that for `spancat` components with a custom `span_key`, the score weights currently need to be modified manually in `[training.score_weights]` for them to be available during training. To suppress the default score weights `spans_sc_p/r/f` during training, set them to `null` in `[training.score_weights]`. * Update website/docs/api/scorer.md * Fix scorer for spans key containing underscore * Increment version * Add Spans to Evaluate CLI (#8439) * Add Spans to Evaluate CLI * Change to spans_key * Add spans per_type output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix spancat GPU issues (#8455) * Fix GPU issues * Require thinc >=8.0.6 * Switch to glorot_uniform_init * Fix and test ngram suggester * Include final ngram in doc for all sizes * Fix ngrams for docs of the same length as ngram size * Handle batches of docs that result in no ngrams * Add tests Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Nirant <NirantK@users.noreply.github.com>	2021-06-24 12:35:27 +02:00
Adriane Boyd	9fde258053	Use minor version for compatibility check (#8403 ) * Use minor version for compatibility check * Use minor version of compatibility table * Soften warning message about incompatible models * Add test for presence of current version in compatibility table * Add test for download compatibility table * Use minor version of lower pin in error message if possible * Fall back to spacy_git_version if available * Fix unknown version string	2021-06-21 09:39:22 +02:00
Adriane Boyd	5646fcbe46	Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1	2021-06-15 15:05:17 +02:00
Narayan Acharya	6b79714080	Address missing config overrides post load of models (#8208 )	2021-05-31 18:36:52 +10:00
Adriane Boyd	7cf5bd072f	Refactor util.to_ternary_int (#7944 ) * Refactor to avoid literal comparison with `is` * Extend tests	2021-04-29 16:58:54 +02:00
Sofie Van Landeghem	e0b29f8ef7	Fix scoring normalization (#7629 ) * fix scoring normalization * score weights by total sum instead of per component * cleanup * more cleanup	2021-04-26 16:53:38 +02:00
Adriane Boyd	f68fc29130	Update sent_starts in Example.from_dict (#7847 ) * Update sent_starts in Example.from_dict Update `sent_starts` for `Example.from_dict` so that `Optional[bool]` values have the same meaning as for `Token.is_sent_start`. Use `Optional[bool]` as the type for sent start values in the docs. * Use helper function for conversion to ternary ints	2021-04-22 11:32:45 +02:00
Adriane Boyd	15bd230413	Set catalogue lower pin to v2.0.3 (#7762 ) * Set catalogue lower pin to v2.0.2 * Update importlib-metadata pins to match * Require catalogue v2.0.3 Switch to vendored `importlib-metadata` v3.2.0 provided by `catalogue`.	2021-04-19 18:37:17 +10:00
Adriane Boyd	39153ef90f	Update lexeme_norm checks * Add util method for check * Add new languages to list with lexeme norm tables * Add check to all relevant components * Add config details to warning message Note that we're not actually inspecting the model config to see if `NORM` is used as an attribute, so it may warn in cases where it's not relevant.	2021-03-19 10:59:27 +01:00
Adriane Boyd	0ad9e16ec3	Check for callbacks entry points	2021-03-18 21:18:25 +01:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Adriane Boyd	e9f7f9a4bc	Fix is_cython_func for additional imported code * Fix `is_cython_func` for imported code loaded under `python_code` module name * Add `make_named_tempfile` context manager to test utils to test loading of imported code * Add test for validation of `initialize` params in custom module	2021-03-01 16:37:39 +01:00
Adriane Boyd	a3293efc48	Add time and level to default logging formatter	2021-02-15 14:19:20 +01:00
Ines Montani	21176c69b0	Update and add test	2021-02-10 14:12:00 +11:00
Koichi Yasuoka	8ed788660b	Several callable objects do not have __qualname__	2021-02-09 14:43:02 +09:00
Sofie Van Landeghem	f319d2765f	Add capture argument to project_run (#6878 ) * add capture argument to project_run and run_commands * git bump to 3.0.1 * Set version to 3.0.1.dev0 Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-02-02 10:11:15 +08:00
Ines Montani	325f47500d	Move replacement logic to Language.from_config	2021-01-29 19:37:04 +11:00
Ines Montani	01ecfbcc45	Merge branch 'develop' into feature/replace-listeners	2021-01-29 15:57:32 +11:00

1 2 3 4 5 ...

489 Commits