spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-02 18:06:46 +03:00

Author	SHA1	Message	Date
Adriane Boyd	4f37e4031c	Update spacy/ml/tb_framework.pyx Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-07-20 09:59:19 +02:00
svlandeg	96f2e30c4b	cython fixes and cleanup	2023-07-19 17:41:29 +02:00
svlandeg	846472129c	merge fixes (2)	2023-07-19 16:38:37 +02:00
svlandeg	47a82c6164	merge fixes	2023-07-19 16:38:29 +02:00
svlandeg	0e3b6a87d6	Merge branch 'upstream_master' into sync_v4	2023-07-19 16:37:31 +02:00
Sofie Van Landeghem	ea54d1775a	Merge pull request #12840 from svlandeg/sync_develop Sync develop	2023-07-19 13:12:51 +02:00
svlandeg	79ec68f01b	Merge branch 'upstream_master' into sync_develop	2023-07-19 12:08:52 +02:00
Basile Dura	b0228d8ea6	ci: add cython linter (#12694 ) * chore: add cython-linter dev dependency * fix: lexeme.pyx * fix: morphology.pxd * fix: tokenizer.pxd * fix: vocab.pxd * fix: morphology.pxd (line length) * ci: add cython-lint * ci: fix cython-lint call * Fix kb/candidate.pyx. * Fix kb/kb.pyx. * Fix kb/kb_in_memory.pyx. * Fix kb. * Fix training/ partially. * Fix training/. Ignore trailing whitespaces and too long lines. * Fix ml/. * Fix matcher/. * Fix pipeline/. * Fix tokens/. * Fix build errors. Fix vocab.pyx. * Fix cython-lint install and run. * Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution. * Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues. * Make cython-lint install conditional. Fix tokenizer.pyx. * Fix remaining files. Reenable cython-lint check. * Readded parentheses. * Fix test_build_dependencies(). * Add explanatory comment to cython-lint execution. --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-07-19 12:03:31 +02:00
Adriane Boyd	1509c96694	Clean up unused code in Language (#12836 ) Follow-up to #12701.	2023-07-18 14:10:30 +02:00
Adriane Boyd	6bf7c65329	Update matcher pattern validation tests (#12835 ) - parametrize over individual token patterns (as originally intended, as far as I can tell) - add a test for lowercase `in` in patterns	2023-07-18 10:00:07 +02:00
Adriane Boyd	95075298f5	Update pex Makefile defaults (#12832 ) * Update pex Makefile defaults - switch to python 3.8 - only install spacy-lookups-data for extra packages * Update website for pex defaults	2023-07-18 09:29:04 +02:00
Ian Thompson	ef20e114e0	Typo fix in `Language.replace_listeners` docs (#12823 ) * modified: spacy/language.py - corrected typo in docstring for :method:`Language.replace_listeners` - added noqa comment on unused local variable assignment in :method:`Language.from_config` as I wasn't sure if it should be unassigned modified: website/docs/api/language.mdx - corrected typo in `Language.replace_listeners` markdown * modified: spacy/language.py - removed noqa comment --------- Co-authored-by: Ian Thompson <ian.thompson@hrblock.com>	2023-07-14 09:45:54 +02:00
Connor Brinton	0566c3a166	🐛 Escape annotated HTML tags in span renderer (#12817 ) These changes add a missing call to `escape_html` in the displaCy span renderer. Previously span-annotated tokens would be inserted into the page markup without being escaped, resulting in potentially incorrect rendering. When I encountered this issue, it resulted in some docs and span underlines being superimposed on top of properly rendered docs and span underlines near the beginning of the visualization (due to an unescaped `<span>` tag).	2023-07-13 17:33:05 +02:00
Sofie Van Landeghem	ddffd09602	Trainable lemmatizer docs link (#12795 ) * add an anchor to the trainable lemmatizer section * add requirement for morphologizer,tagger to rule-based lemmatizer * morphologizer only	2023-07-07 15:18:16 +02:00
Adriane Boyd	1a55661cfb	Update website binder version to v3.6 (#12805 )	2023-07-07 10:52:33 +02:00
Adriane Boyd	41dba5bd34	Update max_length default in span finder docs (#12803 )	2023-07-07 10:17:41 +02:00
Sofie Van Landeghem	b1b20bf69d	Replace projects functionality with weasel (#12769 ) * Setting up weasel branch (#12456) * remove project-specific functionality * remove project-specific tests * remove project-specific schemas * remove project-specific information in about * remove project-specific functions in util.py * remove project-specific error strings * remove project-specific CLI commands * black formatting * restore some functions that are used beyond projects * remove project imports * remove imports * remove remote_storage tests * remove one more project unit test * update for PR 12394 * remove get_hash and get_checksum * remove upload_ and download_file methods * remove ensure_pathy * revert clumsy fingers * reinstate E970 * feat: use weasel as spacy project command (#12473) * feat: use weasel as spacy project command * build: use constrained requirement for weasel * feat: add weasel to the library requirements * build: update weasel to new version * build: use specific weasel tag * build: use weasel-0.1.0rc1 from PyPI * fix: remove weasel from requirements.txt * fix: requirements.txt and setup.cfg need to reflect each other * feat: remove legacy spacy project code * bump version * further merge fixes * isort --------- Co-authored-by: Basile Dura <bdura@users.noreply.github.com>	2023-07-07 09:10:27 +02:00
Sofie Van Landeghem	9e63006b12	Merge pull request #12800 from explosion/master_copy Sync develop with master	2023-07-07 08:44:19 +02:00
svlandeg	991bcc111e	disable tests until 3.7 models are available	2023-07-07 08:09:57 +02:00
Madeesh Kannan	d195923164	Set version to `3.7.0.dev0` (#12799 )	2023-07-06 18:29:03 +02:00
svlandeg	d26e4e0849	Revert "feat: add example stubs (#12679 )" This reverts commit `30bb34533a`.	2023-07-06 17:02:38 +02:00
Basile Dura	30bb34533a	feat: add example stubs (#12679 ) * feat: add example stubs * fix: add required annotations * fix: mypy issues * fix: use Py36-compatible Portocol * Minor reformatting --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: svlandeg <svlandeg@github.com>	2023-07-06 16:49:43 +02:00
Sofie Van Landeghem	536798f9e3	Disallow False for first/last arguments of add_pipe (#12793 ) * Literal True for first/last options * add test case * update docs * remove old redundant test case * black formatting * use Optional typing in docstrings Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-07-06 15:20:13 +02:00
Adriane Boyd	6fc153a266	Merge pull request #12794 from adrianeboyd/chore/v3.6.0-2 Reenable compat+models tests for v3.6.0	2023-07-06 13:22:21 +02:00
Adriane Boyd	4e19ec7eb8	Docs for v3.6.0 (#12792 ) * Docs for v3.6.0 * Add sl performance * Add da trf note	2023-07-06 12:58:25 +02:00
Adriane Boyd	76329e1dde	Revert "Temporarily skip download CLI related tests in CI" This reverts commit `46ce66021a`.	2023-07-06 12:48:06 +02:00
Adriane Boyd	a1191146f5	Revert "Temporarily skip tests for compat table" This reverts commit `dd5e00c735`.	2023-07-06 12:47:50 +02:00
Adriane Boyd	830dcca367	SpanFinder: set default max_length to 25 (#12791 ) When the default `max_length` is not set and there are longer training documents, it can be difficult to train and evaluate the span finder due to memory limits and the time it takes to evaluate a huge number of predicted spans.	2023-07-06 09:55:34 +02:00
Madeesh Kannan	8113cfb257	`Language.replace_listeners`: Pass the replaced listener and the `tok2vec` pipe to the callback (#12785 ) * `Language.replace_listeners`: Pass the replaced listener and the `tok2vec` pipe to the callback * Update developer docs * `isort` fixes * Add error message to assertion * Add clarification to dev docs * Replace assertion with exception * Doc fixes	2023-07-05 13:36:04 +02:00
Sofie Van Landeghem	6f3a71999e	Merge pull request #12784 from explosion/master Merge `master` into `develop`	2023-07-04 15:05:15 +02:00
Tom Aarsen	eab929361d	Use 'exclude' instead of 'disable' (#12783 ) as suggested by @svlandeg	2023-07-04 11:45:13 +02:00
Marcus Blättermann	bd239511a4	Fix problem with missing syntax highlighting languages causing runtime crash on the website (#12781 ) * Fix problem with universe pages using `docker` language * Fix problem with universe pages using `r` language * Add fallback, in case code language is unknown	2023-07-03 10:24:25 +02:00
Daniël de Kok	57a230c6e4	Remove section about parallel training with Ray (#12770 ) The Ray integration is currently broken, having these docs around suggest that this functionality is currently available.	2023-06-28 17:09:57 +02:00
Sofie Van Landeghem	b615964be7	Merge pull request #12752 from danieldk/maintenance/sync-v4-master-20230626 Sync `master` into `v4`	2023-06-28 08:56:54 +01:00
Adriane Boyd	fb0da3e097	Support custom token/lexeme attribute for vectors (#12625 ) * Support custom token/lexeme attribute for vectors * Fix imports * Back off to ORTH without Vectors.attr * Fallback if vectors.attr doesn't exist * Update docs	2023-06-28 09:43:14 +02:00
Adriane Boyd	337a360cc7	Use spans_ prefix for default span finder scores (#12753 )	2023-06-27 19:32:17 +02:00
Adriane Boyd	65f6c9cd10	Support overriding registered functions in configs (#12623 ) Support overriding registered functions in configs. Previously the registry name was parsed as a section name rather than as a registry name.	2023-06-27 17:36:33 +02:00
Adriane Boyd	c067b5264c	Address issues with source with component names and replacing listeners (#12701 ) When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object: * component.name * component.listener_map / component.listening_components for tok2vec and transformer When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that: * find_listeners relies on component.name to set the name in the listener_map * replace_listeners relies on listener_map to determine how to modify the configs In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated. In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.	2023-06-27 10:47:07 +02:00
Daniël de Kok	8b2732e276	Fix training.callbacks <-> language import cycle	2023-06-26 12:43:45 +02:00
Daniël de Kok	122f3b32ad	Fix span <-> underscore import cycle	2023-06-26 12:43:21 +02:00
Daniël de Kok	bf92ca4f10	Merge remote-tracking branch 'upstream/master' into v4-isort	2023-06-26 12:43:00 +02:00
Daniël de Kok	2468742cb8	isort all the things	2023-06-26 11:41:03 +02:00
Daniël de Kok	68089f65cd	Configure isort to use the Black profile, recursively isort the `spacy` module	2023-06-26 11:40:32 +02:00
Adriane Boyd	e1664217f5	Add spancat_singlelabel to debug data CLI (#12749 )	2023-06-26 10:25:20 +02:00
Daniël de Kok	17c4a3d646	Set version to v4.0.0.dev1 (#12748 )	2023-06-23 09:43:41 +02:00
Sofie Van Landeghem	95619b6736	Merge pull request #12717 from danieldk/sync-v4-master-20230612 Merge master into v4	2023-06-22 17:44:57 +01:00
Daniël de Kok	096794dd74	Account for differences between Span.sents in spaCy 3/4	2023-06-22 15:38:22 +02:00
Adriane Boyd	cb4fdc83e4	Merge pull request #12742 from adrianeboyd/chore/v3.6.0 Set version to v3.6.0	2023-06-21 15:34:28 +02:00
Adriane Boyd	34971bcbd1	Set version to v3.6.0	2023-06-21 12:59:36 +02:00
Adriane Boyd	dd5e00c735	Temporarily skip tests for compat table	2023-06-21 12:59:36 +02:00

... 3 4 5 6 7 ...

16293 Commits