spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-19 00:21:58 +03:00

Author	SHA1	Message	Date
Adriane Boyd	1ebe7db07c	Support local filesystem remotes for projects (#11762 ) * Support local filesystem remotes for projects * Fix support for local filesystem remotes for projects * Use `FluidPath` instead of `Pathy` to support both filesystem and remote paths * Create missing parent directories if required for local filesystem * Add a more general `_file_exists` method to support both `Pathy`, `Path`, and `smart_open`-compatible URLs * Add explicit `smart_open` dependency starting with support for `compression` flag * Update `pathy` dependency to exclude older versions that aren't compatible with required `smart_open` version * Update docs to refer to `Pathy` instead of `smart_open` for project remotes (technically you can still push to any `smart_open`-compatible path but you can't pull from them) * Add tests for local filesystem remotes * Update pathy for general BlobStat sorting * Add import * Remove _file_exists since only Pathy remotes are supported * Format CLI docs * Clean up merge	2022-11-29 11:40:58 +01:00
Sofie Van Landeghem	96c9cf3448	Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme Move Styleguide out of Readme	2022-11-28 21:22:56 +01:00
Zhangrp	9f986af120	Add example sentence for Chinese in website meta (#11879 )	2022-11-28 14:50:30 +09:00
Marcus Blättermann	5c9faf6eea	Update menu for styleguide This reflects the removed parts from `ecbf052abd`	2022-11-27 03:48:05 +01:00
Marcus Blättermann	90141202c0	Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17	2022-11-27 03:48:03 +01:00
Marcus Blättermann	7f2ea20fee	Update `README.md`	2022-11-27 03:47:11 +01:00
Marcus Blättermann	c23d54fd26	Remove MDX tags from `README.md`	2022-11-27 03:47:11 +01:00
Raphael Mitsch	c0fd8a2e71	find-threshold: CLI command for multi-label classifier threshold tuning (#11280 ) * Add foundation for find-threshold CLI functionality. * Finish first draft for find-threshold. * Add tests. * Revert adjusted import statements. * Fix mypy errors. * Fix imports. * Harmonize arguments with spacy evaluate command. * Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI. * Fix Spancat test. * Add beta parameter to Scorer and PRFScore. * Make beta a component scorer setting. * Remove beta. * Update nlp.config (workaround). * Reload pipeline on threshold change. Adjust tests. Remove confection reference. * Remove assumption of component being a Pipe object or having a .cfg attribute. * Adjust test output and reference values. * Remove beta references. Delete universe.json. * Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests. * Update spacy/cli/find_threshold.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Remove adding labels in tests. * Remove unused error * Undo changes to PRFScorer * Change default value for n_trials. Log table iteratively. * Add warnings for pointless applications of find_threshold(). * Fix imports. * Adjust type check of TextCategorizer to exclude subclasses. * Change check of if there's only one unique value in scores. * Update spacy/cli/find_threshold.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Incorporate feedback. * Fix test issue. Update docstring. * Update docs & docstring. * Update spacy/tests/test_cli.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add examples to docs. Rename _nlp to nlp in tests. * Update spacy/cli/find_threshold.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/cli/find_threshold.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-11-25 11:44:55 +01:00
kadarakos	dece775279	correct ndim in docs (#11869 )	2022-11-25 11:31:28 +01:00
Madeesh Kannan	5ea14af32b	Add `training.before_update` callback (#11739 ) * Add `training.before_update` callback This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning. * Fix type annotation, default config value * Generalize arguments passed to the callback * Update schema * Pass `epoch` to callback, rename `current_step` to `step` * Add test * Simplify test * Replace config string with `spacy.blank` * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Cleanup imports Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-23 17:54:58 +01:00
Paul O'Leary McCann	8271cfb4cd	Remove Learning Path spaCy (#11846 )	2022-11-23 11:03:18 +01:00
Marcus Blättermann	ecbf052abd	Remove `README.md` content from styleguide	2022-11-23 02:04:54 +01:00
Marcus Blättermann	5659eeaadd	Remove styleguide content from `README.md`	2022-11-23 02:04:54 +01:00
Marcus Blättermann	8c0ceca637	Move `README.md` content to styleguide	2022-11-23 02:04:54 +01:00
Marcus Blättermann	0794e5c6cc	Add missing files to project structure in `README.md`	2022-11-23 02:04:54 +01:00
Marcus Blättermann	96218a1e8f	Delete `styleguide.md` This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept	2022-11-23 02:04:54 +01:00
Marcus Blättermann	9d96e44a87	Apply Prettier to `README.md`	2022-11-23 02:04:49 +01:00
Paul O'Leary McCann	e3173bd86d	Remove spikex from Universe (#11825 )	2022-11-18 08:24:22 +01:00
Peter Baumgartner	9baa686f82	remove migration support form (#11802 )	2022-11-14 16:53:14 +01:00
Paul O'Leary McCann	bb523d4d91	Remove spacy-ray from docs (#11781 ) * Remove spacy ray from cli docs * Remove more ray docs * Remove ray from universe	2022-11-14 19:58:38 +09:00
Edward	3478ff1eb0	remove new v2 tags (#11780 )	2022-11-14 17:41:01 +09:00
Jacobo Myerston	322b5dc1df	Add greCy to Universe (#11774 ) * Update universe.json * Update universe.json fixes Github value	2022-11-10 13:21:20 +09:00
Raphael Mitsch	20bbbe3e44	Revert disable/disabled merging behavior (#11745 ) * Merge disable with disabled. Adjust warnings, errors and tests. * Replace any() with set operation. * Update spacy/tests/pipeline/test_pipe_methods.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update docs. * Remve reference to config entry nlp.enabled from docs. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-08 14:58:10 +01:00
Adriane Boyd	420b1d854b	Update textcat scorer threshold behavior (#11696 ) * Update textcat scorer threshold behavior For `textcat` (with exclusive classes) the scorer should always use a threshold of 0.0 because there should be one predicted label per doc and the numeric score for that particular label should not matter. * Rename to test_textcat_multilabel_threshold * Remove all uses of threshold for multi_label=False * Update Scorer.score_cats API docs * Add tests for score_cats with thresholds * Update textcat API docs * Fix types * Convert threshold back to float * Fix threshold type in docstring * Improve formatting in Scorer API docs	2022-11-02 15:35:04 +01:00
Aaron Zipp	d25f09468c	Spelling mistake in rule-based-matching.md (#11717 ) Changed retokenize to retokenizer	2022-10-31 13:27:12 +09:00
Paul O'Leary McCann	6b78135b9e	Add warning to install widget for M1 GPUs (#11666 ) * Add warning to install widget for M1 GPUs * Use Thinc tracking issue instead * Update website/src/widgets/quickstart-install.js Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Underline URL in warning * Update website/src/widgets/quickstart-install.js Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Don't install cupy on m1 gpus Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-10-27 15:08:24 +02:00
Adriane Boyd	8740e4341f	Update languages and version in README and website (#11694 )	2022-10-25 14:54:54 +02:00
Adriane Boyd	6c380d4fc6	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5	2022-10-20 13:45:17 +02:00
Adriane Boyd	7e56701057	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5	2022-10-20 13:38:49 +02:00
Cellan Hall	b69d249a22	Adding `spacy-cleaner` to the spaCy universe (#11674 ) * added spacy-cleaner to the spaCy universe * Move data to righ section of universe.json * Cleanup - fix typo ("replacers") - spaCy doesn't need to be marked as code - lemma of "Hello" is lower case Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2022-10-20 20:38:29 +09:00
Paul O'Leary McCann	bf83f6872a	Add detailed example of env dict usage (#11677 ) * Add detailed example of env dict usage * Mark code blocks as yaml	2022-10-20 20:35:03 +09:00
Paul O'Leary McCann	858565a567	Fix issues with DVC commands (#11592 ) * Fix flag handling in dvc Prior to this commit, if a flag (--verbose or --quiet) was passed to DVC, it would be added to the end of the generated dvc command line. This would result in the command being interpreted as part of the actual command to run, rather than an argument to dvc. This would result in command lines like: spacy project run preprocess --verbose That would fail with an error that there's no such directory as `--verbose`. This change puts the flags at the front of the dvc command so that they are interpreted correctly. It removes the `run_dvc_commands` function, which had been reduced to just a for loop and wasn't used elsewhere. A separate problem is that there's no way to specify the quiet behaviour to dvc from the command line, though it's unclear if that's a bug. * Add dvc quiet flag to docs * Handle case in DVC where no commands are appropriate If only have commands with no deps or outputs (admittedly unlikely), you get a weird error about the dvc file not existing. This gives explicit output instead. * Add support for quiet flag * Fix command execution Commands are strings now because they're joined further up.	2022-10-18 15:11:39 +09:00
Paul O'Leary McCann	2e52479eec	Fix example code for spacy-wordnet (#11593 ) * Fix example code for spacy-wordnet It looks like in the most recent version, 0.1.0, it's no longer possible to pass the lang parameter to the component separately. Doing so will raise an error. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Cleanup * More cleanup Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-10-11 16:45:05 +02:00
Sofie Van Landeghem	b187076a2d	fix docs (#11573 )	2022-10-03 17:01:04 +02:00
svlandeg	9c8cdb403e	Merge branch 'master_copy' into develop_copy	2022-09-30 15:40:26 +02:00
Gabriele Picco	ff9002b726	Add Zshot Spacy plugin (#11557 ) * Add Zshot Spacy plugin Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-09-29 17:34:44 +02:00
Paul O'Leary McCann	ba63f57f81	Update docs to reflect Doc input to Language (#11555 )	2022-09-29 18:50:29 +09:00
Taniguchi Yasufumi	9557b0fb01	Add spacy-partial-tagger to spaCy Universe (#11538 )	2022-09-27 14:11:50 +02:00
Paul O'Leary McCann	a44b7d4622	Add experimental coref docs (#11291 ) * Add experimental coref docs * Docs cleanup * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Apply changes from code review * Fix prettier formatting It seems a period after a number made this think it was a list? * Update docs on examples for initialize * Add docs for coref scorers * Remove 3.4 notes from coref There won't be a "new" tag until it's in core. * Add docs for span cleaner * Fix docs * Fix docs to match spacy-experimental These weren't properly updated when the code was moved out of spacy core. * More doc fixes * Formatting * Update architectures * Fix links * Fix another link Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>	2022-09-27 18:11:23 +09:00
Paul O'Leary McCann	936a5f0506	Fix English pipeline names in 3.4 release notes (#11542 )	2022-09-27 08:25:24 +02:00
Richard Hudson	6f692a06d5	Remove side effects from Doc.__init__() (#11506 ) * Remove side effects from Doc.__init__() * Changes based on review comment * Readd test * Change interface of Doc.__init__() * Simplify test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update doc.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-09-26 15:58:21 +02:00
Basile Dura	f40d2fac29	fix: remove duplicate v3.2 (#11530 )	2022-09-23 13:18:51 +02:00
Raphael Mitsch	af9b01ef97	Add dependency check to project step runs (#11226 ) * Add dependency check to project step running. * Fix dependency mismatch warning. * Remove newline. * Add types-setuptools to setup.cfg. * Move types-setuptools to test requirements. Move warnings into _validate_requirements(). Handle file reading in project_run(). * Remove newline formatting for output of package conflicts. * Show full version conflict message instead of just package name. * Update spacy/cli/project/run.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix typo. * Re-add rephrasing of message for conflicting packages. Remove requirements path redundancy. * Update spacy/cli/project/run.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/cli/project/run.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Print unified message for requirement conflicts and missing requirements. * Update spacy/cli/project/run.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix warning message. * Print conflict/missing messages individually. * Print conflict/missing messages individually. * Add check_requirements setting in project.yml to disable requirements check. * Update website/docs/usage/projects.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/usage/projects.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update description of project.yml structure in projects.md. * Update website/docs/usage/projects.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Prettify projects docs. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-09-16 16:54:31 +02:00
Sofie Van Landeghem	df0b815c23	more explicit Example constructor example (#11489 ) * make constructor example for Example more explicit * shorten example and add spaces	2022-09-16 09:26:33 +02:00
Richard Hudson	3f0c3ad7d3	Correct alignment example and documentation (#11491 ) * Correct example and documentation * Added altered example.md * Changes based on review + apply prettier * Remote unnecessary 'the' Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2022-09-14 09:36:55 +02:00
Adriane Boyd	6be6913ba5	Update cupy extras (#11279 ) * Update cupy extras: * Extend to v11 * Add `cupy-cuda11x` and `cupy-wheel` * Update quickstart to use `cupy-wheel` for CUDA 10.2+ * Rename cuda-wheel to cuda-autodetect, remove repeated CUDA in menu	2022-09-13 09:04:53 +02:00
Sofie Van Landeghem	cc10a27c59	Prevent tok2vec to broadcast to listeners when predicting (#11385 ) * replicate bug with tok2vec in annotating components * add overfitting test with a frozen tok2vec * remove broadcast from predict and check doc.tensor instead * remove broadcast * proper error * slight rephrase of documentation	2022-09-12 15:36:48 +02:00
Madeesh Kannan	aac9a58c29	Add docs for the `spacy.models_and_pipes_with_nvtx_range.v1` callback (#11463 ) * Add docs for the `spacy.models_and_pipes_with_nvtx_range.v1` callback * Add `new` tag	2022-09-09 10:46:01 +02:00
Paul O'Leary McCann	2602a30d32	Fix DVC command example (#11457 ) This command doesn't have the project dir, but it's required.	2022-09-08 13:42:47 +02:00
Raphael Mitsch	1f23c615d7	Refactor KB for easier customization (#11268 ) * Add implementation of batching + backwards compatibility fixes. Tests indicate issue with batch disambiguation for custom singular entity lookups. * Fix tests. Add distinction w.r.t. batch size. * Remove redundant and add new comments. * Adjust comments. Fix variable naming in EL prediction. * Fix mypy errors. * Remove KB entity type config option. Change return types of candidate retrieval functions to Iterable from Iterator. Fix various other issues. * Update spacy/pipeline/entity_linker.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/pipeline/entity_linker.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/kb_base.pyx Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/kb_base.pyx Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Update spacy/pipeline/entity_linker.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Add error messages to NotImplementedErrors. Remove redundant comment. * Fix imports. * Remove redundant comments. * Rename KnowledgeBase to InMemoryLookupKB and BaseKnowledgeBase to KnowledgeBase. * Fix tests. * Update spacy/errors.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Move KB into subdirectory. * Adjust imports after KB move to dedicated subdirectory. * Fix config imports. * Move Candidate + retrieval functions to separate module. Fix other, small issues. * Fix docstrings and error message w.r.t. class names. Fix typing for candidate retrieval functions. * Update spacy/kb/kb_in_memory.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix typing. * Change typing of mentions to be Span instead of Union[Span, str]. * Update docs. * Update EntityLinker and _architecture docs. * Update website/docs/api/entitylinker.md Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> * Adjust message for E1046. * Re-add section for Candidate in kb.md, add reference to dedicated page. * Update docs and docstrings. * Re-add section + reference for KnowledgeBase.get_alias_candidates() in docs. * Update spacy/kb/candidate.pyx * Update spacy/kb/kb_in_memory.pyx * Update spacy/pipeline/legacy/entity_linker.py * Remove canididate.md. Remove mistakenly added config snippet in entity_linker.py. Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-09-08 10:38:07 +02:00

1 2 3 4 5 ...

2983 Commits