spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-09 21:52:37 +03:00

Author	SHA1	Message	Date
Adriane Boyd	40e1000db0	Restore Doc attr getter values in Doc.to_json (#11700 )	2022-11-03 11:49:08 +01:00
Paul O'Leary McCann	db56600536	Fix default parameters for load functions (fix #11706 ) (#11713 ) * Fix default parameters for load functions Some load functions used SimpleFrozenList() directly instead of the _DEFAULT_EMPTY_PIPES parameter. That mostly worked as intended, but the changes in #11459 check for equality using identity, not value, so a warning is incorrectly raised sometimes, as in #11706. This change just has all the load functions use the singleton value instead. * Add test that there are no warnings on module-based load This will succeed due to changes in this branch, but local tests with the latest release failed as intended. * Try reverting commit and see if CI changes There is an error in CI that is probably unrelated. Revert "Fix default parameters for load functions" This reverts commit `dc46b35687`. * Revert "Try reverting commit and see if CI changes" This reverts commit `2514ed07ef`. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-11-03 10:52:59 +01:00
Adriane Boyd	1211552f0e	Modernize and simplify CI steps (#11738 ) * Use `build` instead of `python setup.py sdist` * Remove in-place build with `setup.py` * Remove `gpu` parameter and GPU tests * Keep `architecture` and `num_build_jobs` in azure steps with CI defaults * Fix use of `num_build_jobs` parameters * Remove now-unused `prefix` parameter * Test imports and CLI before installing test requirements * Remove `.egg-info` directory in addition to source directory for an warning-free `import spacy` Switch `thinc-apple-ops` test to python 3.11 (as most recent python that is tested across platforms)	2022-11-03 09:29:46 +01:00
Ryn Daniels	2fb7e4dc74	More version updates for github action deprecation warnings (#11705 ) * More version updates for github action deprecation warnings * fix the deprecated set-output commands * bump explosion-bot to run on ubuntu-latest	2022-11-02 15:36:30 +01:00
Adriane Boyd	420b1d854b	Update textcat scorer threshold behavior (#11696 ) * Update textcat scorer threshold behavior For `textcat` (with exclusive classes) the scorer should always use a threshold of 0.0 because there should be one predicted label per doc and the numeric score for that particular label should not matter. * Rename to test_textcat_multilabel_threshold * Remove all uses of threshold for multi_label=False * Update Scorer.score_cats API docs * Add tests for score_cats with thresholds * Update textcat API docs * Fix types * Convert threshold back to float * Fix threshold type in docstring * Improve formatting in Scorer API docs	2022-11-02 15:35:04 +01:00
Adriane Boyd	f7edd84b44	Switch CI to Python 3.11.0 (#11737 )	2022-11-02 13:42:20 +01:00
Aaron Zipp	d25f09468c	Spelling mistake in rule-based-matching.md (#11717 ) Changed retokenize to retokenizer	2022-10-31 13:27:12 +09:00
Paul O'Leary McCann	d61e742960	Handle Docs with no entities in EntityLinker (#11640 ) * Handle docs with no entities If a whole batch contains no entities it won't make it to the model, but it's possible for individual Docs to have no entities. Before this commit, those Docs would cause an error when attempting to concatenate arrays because the dimensions didn't match. It turns out the process of preparing the Ragged at the end of the span maker forward was a little different from list2ragged, which just uses the flatten function directly. Letting list2ragged do the conversion avoids the dimension issue. This did not come up before because in NEL demo projects it's typical for data with no entities to be discarded before it reaches the NEL component. This includes a simple direct test that shows the issue and checks it's resolved. It doesn't check if there are any downstream changes, so a more complete test could be added. A full run was tested by adding an example with no entities to the Emerson sample project. * Add a blank instance to default training data in tests Rather than adding a specific test, since not failing on instances with no entities is basic functionality, it makes sense to add it to the default set. * Fix without modifying architecture If the architecture is modified this would have to be a new version, but this change isn't big enough to merit that.	2022-10-28 10:25:34 +02:00
Paul O'Leary McCann	6b78135b9e	Add warning to install widget for M1 GPUs (#11666 ) * Add warning to install widget for M1 GPUs * Use Thinc tracking issue instead * Update website/src/widgets/quickstart-install.js Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Underline URL in warning * Update website/src/widgets/quickstart-install.js Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Don't install cupy on m1 gpus Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-10-27 15:08:24 +02:00
Adriane Boyd	865691d169	Adjust default attrs for textcat configs (#11698 )	2022-10-26 08:43:00 +02:00
Ryn Daniels	a9139907a9	update github actions to deal with deprecations (#11702 )	2022-10-26 08:15:13 +02:00
Adriane Boyd	0a9859ba01	Reduce python 3.10 in CI to one OS (#11703 )	2022-10-25 19:38:23 +02:00
Adriane Boyd	8740e4341f	Update languages and version in README and website (#11694 )	2022-10-25 14:54:54 +02:00
Adriane Boyd	88d35450dc	Rename test helper method with non-test_ name (#11701 )	2022-10-25 14:53:18 +02:00
github-actions[bot]	84d9cb6b38	Auto-format code with black (#11687 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-10-21 11:54:17 +02:00
Adriane Boyd	fb280001cc	Merge pull request #11678 from adrianeboyd/chore/update-develop-from-master-v3.5 Update develop from master before v3.5	2022-10-20 15:45:19 +02:00
Adriane Boyd	6c380d4fc6	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5	2022-10-20 13:45:17 +02:00
Adriane Boyd	7e56701057	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5	2022-10-20 13:38:49 +02:00
Cellan Hall	b69d249a22	Adding `spacy-cleaner` to the spaCy universe (#11674 ) * added spacy-cleaner to the spaCy universe * Move data to righ section of universe.json * Cleanup - fix typo ("replacers") - spaCy doesn't need to be marked as code - lemma of "Hello" is lower case Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2022-10-20 20:38:29 +09:00
Paul O'Leary McCann	bf83f6872a	Add detailed example of env dict usage (#11677 ) * Add detailed example of env dict usage * Mark code blocks as yaml	2022-10-20 20:35:03 +09:00
Adriane Boyd	3d0e895363	Set version to v3.4.2 (#11672 )	2022-10-19 17:33:55 +02:00
Edward	d66ccb8eb0	Fix multiple entries per custom extension in doc json (#11551 ) * Fix multiple extensions and character offset * Rename token_start/end to start/end * Refactor Doc.from_json based on review * Iterate over user_data items * Only add non-empty underscore entries Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-10-19 15:52:47 +02:00
Adriane Boyd	a1eacaa8db	Add python 3.11.0rc2 to CI (#11667 )	2022-10-18 14:36:06 +02:00
Paul O'Leary McCann	858565a567	Fix issues with DVC commands (#11592 ) * Fix flag handling in dvc Prior to this commit, if a flag (--verbose or --quiet) was passed to DVC, it would be added to the end of the generated dvc command line. This would result in the command being interpreted as part of the actual command to run, rather than an argument to dvc. This would result in command lines like: spacy project run preprocess --verbose That would fail with an error that there's no such directory as `--verbose`. This change puts the flags at the front of the dvc command so that they are interpreted correctly. It removes the `run_dvc_commands` function, which had been reduced to just a for loop and wasn't used elsewhere. A separate problem is that there's no way to specify the quiet behaviour to dvc from the command line, though it's unclear if that's a bug. * Add dvc quiet flag to docs * Handle case in DVC where no commands are appropriate If only have commands with no deps or outputs (admittedly unlikely), you get a weird error about the dvc file not existing. This gives explicit output instead. * Add support for quiet flag * Fix command execution Commands are strings now because they're joined further up.	2022-10-18 15:11:39 +09:00
Sofie Van Landeghem	2ce6aadda2	update default configs to recent versions (#11618 )	2022-10-17 12:10:03 +02:00
github-actions[bot]	ceb62352bf	Auto-format code with black (#11649 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-10-14 18:04:55 +09:00
Adriane Boyd	6b5a3e7219	Extend to pydantic v1.10 (#11635 ) * Update types in `spacy.schemas` for updated pydantic+mypy	2022-10-14 08:16:49 +02:00
Sofie Van Landeghem	4d869fcc11	Small fixes to docstrings (#11610 ) * add missing scorer arg to docstring * fix class names in textcat_multilabel * add missing scorer to docstrings	2022-10-12 15:17:40 +02:00
Adriane Boyd	fe06e037bc	Fix init for pymorphy2_lookup lemmatizer mode (#11631 )	2022-10-12 12:18:39 +02:00
Paul O'Leary McCann	2e52479eec	Fix example code for spacy-wordnet (#11593 ) * Fix example code for spacy-wordnet It looks like in the most recent version, 0.1.0, it's no longer possible to pass the lang parameter to the component separately. Doing so will raise an error. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Cleanup * More cleanup Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-10-11 16:45:05 +02:00
Sofie Van Landeghem	29649589fc	remove dtype (#11615 )	2022-10-11 15:25:05 +02:00
Sofie Van Landeghem	ef74f8f5e4	Fix mypy error in edittree lemmatizer (#11612 ) * cleanup imports * try limiting Thinc to previous release * remove Model specification * fix code and revert Thinc constraint	2022-10-11 14:15:22 +02:00
Adriane Boyd	8cd77dd54c	Sync flake8 version across requirements (#11580 )	2022-10-04 11:23:04 +02:00
Sofie Van Landeghem	b187076a2d	fix docs (#11573 )	2022-10-03 17:01:04 +02:00
Sofie Van Landeghem	3033babe98	Merge pull request #11571 from svlandeg/copy_develop update develop with latest from master, incl CI fix	2022-10-03 14:05:51 +02:00
svlandeg	83425d4f6f	Merge branch 'copy_master' into copy_develop	2022-10-03 13:06:31 +02:00
Sofie Van Landeghem	70e21dfcad	PR to test importlib-metadata (#11569 ) * empty commit * restrict importlib-metadata to lower than 5.0.0 * restrict importlib-metadata also for validate CI step * set fixed version for CI * try flake8 5.0.4 in CI validation step * from importlib-metadata from requirements again	2022-10-03 13:04:03 +02:00
Paul O'Leary McCann	087cc74c6a	Remove mention of 1.7 from issue template (#11570 ) It's rare to have anyone using v1 anymore, so this message is no longer helpful.	2022-10-03 11:53:21 +02:00
Sofie Van Landeghem	bf6e43ab2f	Merge pull request #11563 from svlandeg/develop_copy update develop with latest from master	2022-10-03 09:34:38 +02:00
svlandeg	9c8cdb403e	Merge branch 'master_copy' into develop_copy	2022-09-30 15:40:26 +02:00
Gabriele Picco	ff9002b726	Add Zshot Spacy plugin (#11557 ) * Add Zshot Spacy plugin Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/meta/universe.json Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-09-29 17:34:44 +02:00
Sofie Van Landeghem	bcda8bc1e7	update mypy to latest version (#11546 ) * update mypy and disable it for python 3.6 * ignoring mypy's type redefinition error	2022-09-29 14:24:40 +02:00
Paul O'Leary McCann	ba63f57f81	Update docs to reflect Doc input to Language (#11555 )	2022-09-29 18:50:29 +09:00
Adriane Boyd	6d7630c5d3	Allow overriding spacy_version in spacy package meta (#11552 )	2022-09-29 10:44:06 +02:00
Peter Baumgartner	e794d4ae39	`debug data` Spancat Table Improvements (#11504 ) * update * fix format function * pull out _format_number * format with black	2022-09-28 17:16:05 +02:00
Raphael Mitsch	aea16719be	Simplify and clarify enable/disable behavior of spacy.load() (#11459 ) * Change enable/disable behavior so that arguments take precedence over config options. Extend error message on conflict. Add warning message in case of overwriting config option with arguments. * Fix tests in test_serialize_pipeline.py to reflect changes to handling of enable/disable. * Fix type issue. * Move comment. * Move comment. * Issue UserWarning instead of printing wasabi message. Adjust test. * Added pytest.warns(UserWarning) for expected warning to fix tests. * Update warning message. * Move type handling out of fetch_pipes_status(). * Add global variable for default value. Use id() to determine whether used values are default value. * Fix default value for disable. * Rename DEFAULT_PIPE_STATUS to _DEFAULT_EMPTY_PIPES.	2022-09-27 14:22:36 +02:00
Taniguchi Yasufumi	9557b0fb01	Add spacy-partial-tagger to spaCy Universe (#11538 )	2022-09-27 14:11:50 +02:00
Jacobo Myerston	3e8bc1272f	add punctuation to grc (#11426 ) * add punctuation to grc Add support for special editorial punctuation that is common in ancient Greek texts. Ancient Greek texts, as found in digital and print form, have been largely edited by scholars. Restorations and improvements are normally marked with special characters that need to be handled properly by the tokenizer. * add unit tests * simplify regex * move generic quotes to char classes * rename unit test * fix regex Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: svlandeg <svlandeg@github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-09-27 11:38:56 +02:00
Paul O'Leary McCann	a44b7d4622	Add experimental coref docs (#11291 ) * Add experimental coref docs * Docs cleanup * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Apply changes from code review * Fix prettier formatting It seems a period after a number made this think it was a list? * Update docs on examples for initialize * Add docs for coref scorers * Remove 3.4 notes from coref There won't be a "new" tag until it's in core. * Add docs for span cleaner * Fix docs * Fix docs to match spacy-experimental These weren't properly updated when the code was moved out of spacy core. * More doc fixes * Formatting * Update architectures * Fix links * Fix another link Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <svlandeg@github.com>	2022-09-27 18:11:23 +09:00
Adriane Boyd	877671e09a	Preserve missing entity annotation in augmenters (#11540 ) Preserve both `-` and `O` annotation in augmenters rather than relying on `Example.to_dict`'s default support for one option outside of labeled entity spans. This is intended as a temporary workaround for augmenters for v3.4.x. The behavior of `Example` and related IOB utils could be improved in the general case for v3.5.	2022-09-27 10:16:51 +02:00

1 2 3 4 5 ...

15774 Commits