spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-03 10:26:48 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	f54bfb56c9	Don't throw an error if using displacy on an unset span key (#11845 ) * Don't throw an error if using displacy on an unset span key * List available keys in W117	2022-11-28 10:01:09 +01:00
Zhangrp	9f986af120	Add example sentence for Chinese in website meta (#11879 )	2022-11-28 14:50:30 +09:00
Marcus Blättermann	5c9faf6eea	Update menu for styleguide This reflects the removed parts from `ecbf052abd`	2022-11-27 03:48:05 +01:00
Marcus Blättermann	90141202c0	Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17	2022-11-27 03:48:03 +01:00
Marcus Blättermann	7f2ea20fee	Update `README.md`	2022-11-27 03:47:11 +01:00
Marcus Blättermann	c23d54fd26	Remove MDX tags from `README.md`	2022-11-27 03:47:11 +01:00
Adriane Boyd	681ec20914	Add smart_open requirement, update deprecated options (#11864 ) * Switch from deprecated `ignore_ext` to `compression` * Add upload/download test for local files	2022-11-25 13:00:57 +01:00
Adriane Boyd	32396e0bda	Set version to v3.5.0	2022-11-25 12:05:25 +01:00
Adriane Boyd	378db0eb1e	Temporarily skip tests that require models/compat	2022-11-25 12:05:25 +01:00
Raphael Mitsch	c0fd8a2e71	find-threshold: CLI command for multi-label classifier threshold tuning (#11280 ) * Add foundation for find-threshold CLI functionality. * Finish first draft for find-threshold. * Add tests. * Revert adjusted import statements. * Fix mypy errors. * Fix imports. * Harmonize arguments with spacy evaluate command. * Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI. * Fix Spancat test. * Add beta parameter to Scorer and PRFScore. * Make beta a component scorer setting. * Remove beta. * Update nlp.config (workaround). * Reload pipeline on threshold change. Adjust tests. Remove confection reference. * Remove assumption of component being a Pipe object or having a .cfg attribute. * Adjust test output and reference values. * Remove beta references. Delete universe.json. * Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests. * Update spacy/cli/find_threshold.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Remove adding labels in tests. * Remove unused error * Undo changes to PRFScorer * Change default value for n_trials. Log table iteratively. * Add warnings for pointless applications of find_threshold(). * Fix imports. * Adjust type check of TextCategorizer to exclude subclasses. * Change check of if there's only one unique value in scores. * Update spacy/cli/find_threshold.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Incorporate feedback. * Fix test issue. Update docstring. * Update docs & docstring. * Update spacy/tests/test_cli.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add examples to docs. Rename _nlp to nlp in tests. * Update spacy/cli/find_threshold.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/cli/find_threshold.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-11-25 11:44:55 +01:00
kadarakos	dece775279	correct ndim in docs (#11869 )	2022-11-25 11:31:28 +01:00
Adriane Boyd	30d31fd335	Update Russian and Ukrainian lemmatizers (#11811 ) * pymorph2 issues #11620, #11626, #11625: - #11620: pymorphy2_lookup - #11626: handle multiple forms pointing to the same normal form + handling empty POS tag - #11625: matching DET that are labelled as PRON by pymorhp2 * Move lemmatizer algorithm changes back into RussianLemmatizer * Fix uk pymorphy3_lookup mode init * Move and update tests for ru/uk lookup lemmatizer modes * Fix typo * Remove traces of previous behavior for uninflected POS * Refactor to private generic-looking pymorphy methods * Remove xfailed uk lemmatizer cases * Update spacy/lang/ru/lemmatizer.py Co-authored-by: Richard Hudson <richard@explosion.ai> Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com> Co-authored-by: Richard Hudson <richard@explosion.ai>	2022-11-25 11:12:46 +01:00
Adriane Boyd	8f062b849c	Fix Matcher cython profile=True header (#11867 )	2022-11-24 16:03:42 +01:00
Madeesh Kannan	5ea14af32b	Add `training.before_update` callback (#11739 ) * Add `training.before_update` callback This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning. * Fix type annotation, default config value * Generalize arguments passed to the callback * Update schema * Pass `epoch` to callback, rename `current_step` to `step` * Add test * Simplify test * Replace config string with `spacy.blank` * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Cleanup imports Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-23 17:54:58 +01:00
Paul O'Leary McCann	8271cfb4cd	Remove Learning Path spaCy (#11846 )	2022-11-23 11:03:18 +01:00
Paul O'Leary McCann	f1ddac187d	Remove unused error object (#11837 )	2022-11-23 10:51:31 +01:00
Marcus Blättermann	ecbf052abd	Remove `README.md` content from styleguide	2022-11-23 02:04:54 +01:00
Marcus Blättermann	5659eeaadd	Remove styleguide content from `README.md`	2022-11-23 02:04:54 +01:00
Marcus Blättermann	8c0ceca637	Move `README.md` content to styleguide	2022-11-23 02:04:54 +01:00
Marcus Blättermann	0794e5c6cc	Add missing files to project structure in `README.md`	2022-11-23 02:04:54 +01:00
Marcus Blättermann	96218a1e8f	Delete `styleguide.md` This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept	2022-11-23 02:04:54 +01:00
Marcus Blättermann	9d96e44a87	Apply Prettier to `README.md`	2022-11-23 02:04:49 +01:00
Marco Edward Gorelli	f0d8309a28	fix comparison of constants (#11834 ) Co-authored-by: MarcoGorelli <>	2022-11-21 08:12:03 +01:00
github-actions[bot]	89bfd06fbd	Auto-format code with black (#11826 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-11-18 18:24:13 +09:00
Paul O'Leary McCann	e3173bd86d	Remove spikex from Universe (#11825 )	2022-11-18 08:24:22 +01:00
Adriane Boyd	a83463c5e0	Add transformer recommendation for ca (#11819 ) Model recommendation from @cayorodriguez.	2022-11-18 08:15:27 +01:00
Paul O'Leary McCann	75bb7ad541	Check textcat values for validity (#11763 ) * Check textcat values for validity * Fix error numbers * Clean up vals reference * Check category value validity through training The _validate_categories is called in update, which for multilabel is inherited from the single label component. * Formatting	2022-11-17 10:25:01 +01:00
Adriane Boyd	317b6ef99c	Update to mypy 0.990 (#11801 )	2022-11-16 14:09:10 +01:00
Paul O'Leary McCann	c0c54e44bc	Add equality definition for vectors (#11806 ) * Add equality definition for vectors This re-uses the check from sourcing components. * Use the equality check * Format Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-16 09:44:42 +01:00
Sofie Van Landeghem	caa9efad59	prevent rewriting an already raw URL (#11810 )	2022-11-15 14:15:00 +01:00
Denis Bezykornov	7e684ad691	Update russian tokenizer exceptions (#11753 ) * Fix typos, add couple of new abbreviations, remove nonbreaking spaces * Remove space from abbreviation Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-15 11:37:25 +01:00
Peter Baumgartner	9baa686f82	remove migration support form (#11802 )	2022-11-14 16:53:14 +01:00
Paul O'Leary McCann	bb523d4d91	Remove spacy-ray from docs (#11781 ) * Remove spacy ray from cli docs * Remove more ray docs * Remove ray from universe	2022-11-14 19:58:38 +09:00
Edward	3478ff1eb0	remove new v2 tags (#11780 )	2022-11-14 17:41:01 +09:00
github-actions[bot]	188a7d00eb	Auto-format code with black (#11792 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-11-11 09:58:31 +01:00
Jacobo Myerston	322b5dc1df	Add greCy to Universe (#11774 ) * Update universe.json * Update universe.json fixes Github value	2022-11-10 13:21:20 +09:00
Adriane Boyd	03eebe9d1c	Update warning, add tests for project requirements check (#11777 ) * Update warning, add tests for project requirements check * Make warning more general for differences between PEP 508 and pip * Add tests for _check_requirements * Parameterize test	2022-11-09 10:59:28 +01:00
Raphael Mitsch	20bbbe3e44	Revert disable/disabled merging behavior (#11745 ) * Merge disable with disabled. Adjust warnings, errors and tests. * Replace any() with set operation. * Update spacy/tests/pipeline/test_pipe_methods.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update docs. * Remve reference to config entry nlp.enabled from docs. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-08 14:58:10 +01:00
Adriane Boyd	2e3cfd758e	Use python 3.10 for GHA universe alert (#11768 )	2022-11-08 12:46:19 +09:00
Adriane Boyd	e116395f89	Add fallback in requirements check, only check once (#11735 ) * Add fallback in requirements check, only check once * Rename to skip_requirements_check * Update spacy/cli/project/run.py Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2022-11-07 14:46:08 +01:00
Adriane Boyd	6105f20d8a	Switch CI to python 3.11 (#11765 )	2022-11-07 13:25:40 +01:00
Adriane Boyd	e91b47a226	Check for unsafe paths in tarfile.extractall (CVE-2007-4559) (#11746 ) * Adding tarfile member sanitization to extractall() * Format * Simplify and add error message * Fix import * Add comment about CVE Co-authored-by: TrellixVulnTeam <charles.mcfarland@trellix.com>	2022-11-07 10:43:34 +01:00
Paul O'Leary McCann	b76222e56a	Raise Typer limit (#11720 ) * Raise typer limit to <0.7.0 * Raise limit to <0.8.0	2022-11-07 08:11:55 +01:00
Adriane Boyd	ea326cf47d	Fix types for Span.id and Span.id_ (#11744 )	2022-11-07 08:11:13 +01:00
github-actions[bot]	bbf64cfc43	Auto-format code with black (#11749 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-11-04 11:17:43 +01:00
Adriane Boyd	40e1000db0	Restore Doc attr getter values in Doc.to_json (#11700 )	2022-11-03 11:49:08 +01:00
Paul O'Leary McCann	db56600536	Fix default parameters for load functions (fix #11706 ) (#11713 ) * Fix default parameters for load functions Some load functions used SimpleFrozenList() directly instead of the _DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but the changes in #11459 check for equality using identity, not value, so a warning is incorrectly raised sometimes, as in #11706. This change just has all the load functions use the singleton value instead. * Add test that there are no warnings on module-based load This will succeed due to changes in this branch, but local tests with the latest release failed as intended. * Try reverting commit and see if CI changes There is an error in CI that is probably unrelated. Revert "Fix default parameters for load functions" This reverts commit `dc46b35687`. * Revert "Try reverting commit and see if CI changes" This reverts commit `2514ed07ef`. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-03 10:52:59 +01:00
Adriane Boyd	1211552f0e	Modernize and simplify CI steps (#11738 ) * Use `build` instead of `python setup.py sdist` * Remove in-place build with `setup.py` * Remove `gpu` parameter and GPU tests * Keep `architecture` and `num_build_jobs` in azure steps with CI defaults * Fix use of `num_build_jobs` parameters * Remove now-unused `prefix` parameter * Test imports and CLI before installing test requirements * Remove `.egg-info` directory in addition to source directory for an warning-free `import spacy` Switch `thinc-apple-ops` test to python 3.11 (as most recent python that is tested across platforms)	2022-11-03 09:29:46 +01:00
Ryn Daniels	2fb7e4dc74	More version updates for github action deprecation warnings (#11705 ) * More version updates for github action deprecation warnings * fix the deprecated set-output commands * bump explosion-bot to run on ubuntu-latest	2022-11-02 15:36:30 +01:00
Adriane Boyd	420b1d854b	Update textcat scorer threshold behavior (#11696 ) * Update textcat scorer threshold behavior For `textcat` (with exclusive classes) the scorer should always use a threshold of 0.0 because there should be one predicted label per doc and the numeric score for that particular label should not matter. * Rename to test_textcat_multilabel_threshold * Remove all uses of threshold for multi_label=False * Update Scorer.score_cats API docs * Add tests for score_cats with thresholds * Update textcat API docs * Fix types * Convert threshold back to float * Fix threshold type in docstring * Improve formatting in Scorer API docs	2022-11-02 15:35:04 +01:00

... 2 3 4 5 6 ...

15869 Commits