Commit Graph

16293 Commits

Author SHA1 Message Date
Adriane Boyd
4f37e4031c
Update spacy/ml/tb_framework.pyx
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-07-20 09:59:19 +02:00
svlandeg
96f2e30c4b cython fixes and cleanup 2023-07-19 17:41:29 +02:00
svlandeg
846472129c merge fixes (2) 2023-07-19 16:38:37 +02:00
svlandeg
47a82c6164 merge fixes 2023-07-19 16:38:29 +02:00
svlandeg
0e3b6a87d6 Merge branch 'upstream_master' into sync_v4 2023-07-19 16:37:31 +02:00
Sofie Van Landeghem
ea54d1775a
Merge pull request #12840 from svlandeg/sync_develop
Sync develop
2023-07-19 13:12:51 +02:00
svlandeg
79ec68f01b Merge branch 'upstream_master' into sync_develop 2023-07-19 12:08:52 +02:00
Basile Dura
b0228d8ea6
ci: add cython linter (#12694)
* chore: add cython-linter dev dependency

* fix: lexeme.pyx

* fix: morphology.pxd

* fix: tokenizer.pxd

* fix: vocab.pxd

* fix: morphology.pxd (line length)

* ci: add cython-lint

* ci: fix cython-lint call

* Fix kb/candidate.pyx.

* Fix kb/kb.pyx.

* Fix kb/kb_in_memory.pyx.

* Fix kb.

* Fix training/ partially.

* Fix training/. Ignore trailing whitespaces and too long lines.

* Fix ml/.

* Fix matcher/.

* Fix pipeline/.

* Fix tokens/.

* Fix build errors. Fix vocab.pyx.

* Fix cython-lint install and run.

* Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution.

* Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues.

* Make cython-lint install conditional. Fix tokenizer.pyx.

* Fix remaining files. Reenable cython-lint check.

* Readded parentheses.

* Fix test_build_dependencies().

* Add explanatory comment to cython-lint execution.

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-07-19 12:03:31 +02:00
Adriane Boyd
1509c96694
Clean up unused code in Language (#12836)
Follow-up to #12701.
2023-07-18 14:10:30 +02:00
Adriane Boyd
6bf7c65329
Update matcher pattern validation tests (#12835)
- parametrize over individual token patterns (as originally intended, as
far as I can tell)
- add a test for lowercase `in` in patterns
2023-07-18 10:00:07 +02:00
Adriane Boyd
95075298f5
Update pex Makefile defaults (#12832)
* Update pex Makefile defaults

- switch to python 3.8
- only install spacy-lookups-data for extra packages

* Update website for pex defaults
2023-07-18 09:29:04 +02:00
Ian Thompson
ef20e114e0
Typo fix in Language.replace_listeners docs (#12823)
* modified:   spacy/language.py
	- corrected typo in docstring for :method:`Language.replace_listeners`
	- added noqa comment on unused local variable assignment in :method:`Language.from_config` as I wasn't sure if it should be unassigned

modified:   website/docs/api/language.mdx
	- corrected typo in `Language.replace_listeners` markdown

* modified:   spacy/language.py
	- removed noqa comment

---------

Co-authored-by: Ian Thompson <ian.thompson@hrblock.com>
2023-07-14 09:45:54 +02:00
Connor Brinton
0566c3a166
🐛 Escape annotated HTML tags in span renderer (#12817)
These changes add a missing call to `escape_html` in the displaCy span
renderer. Previously span-annotated tokens would be inserted into the
page markup without being escaped, resulting in potentially incorrect
rendering. When I encountered this issue, it resulted in some docs and
span underlines being superimposed on top of properly rendered docs and
span underlines near the beginning of the visualization (due to an
unescaped `<span>` tag).
2023-07-13 17:33:05 +02:00
Sofie Van Landeghem
ddffd09602
Trainable lemmatizer docs link (#12795)
* add an anchor to the trainable lemmatizer section

* add requirement for morphologizer,tagger to rule-based lemmatizer

* morphologizer only
2023-07-07 15:18:16 +02:00
Adriane Boyd
1a55661cfb
Update website binder version to v3.6 (#12805) 2023-07-07 10:52:33 +02:00
Adriane Boyd
41dba5bd34
Update max_length default in span finder docs (#12803) 2023-07-07 10:17:41 +02:00
Sofie Van Landeghem
b1b20bf69d
Replace projects functionality with weasel (#12769)
* Setting up weasel branch (#12456)

* remove project-specific functionality

* remove project-specific tests

* remove project-specific schemas

* remove project-specific information in about

* remove project-specific functions in util.py

* remove project-specific error strings

* remove project-specific CLI commands

* black formatting

* restore some functions that are used beyond projects

* remove project imports

* remove imports

* remove remote_storage tests

* remove one more project unit test

* update for PR 12394

* remove get_hash and get_checksum

* remove upload_ and download_file methods

* remove ensure_pathy

* revert clumsy fingers

* reinstate E970

* feat: use weasel as spacy project command (#12473)

* feat: use weasel as spacy project command

* build: use constrained requirement for weasel

* feat: add weasel to the library requirements

* build: update weasel to new version

* build: use specific weasel tag

* build: use weasel-0.1.0rc1 from PyPI

* fix: remove weasel from requirements.txt

* fix: requirements.txt and setup.cfg need to reflect each other

* feat: remove legacy spacy project code

* bump version

* further merge fixes

* isort

---------

Co-authored-by: Basile Dura <bdura@users.noreply.github.com>
2023-07-07 09:10:27 +02:00
Sofie Van Landeghem
9e63006b12
Merge pull request #12800 from explosion/master_copy
Sync develop with master
2023-07-07 08:44:19 +02:00
svlandeg
991bcc111e disable tests until 3.7 models are available 2023-07-07 08:09:57 +02:00
Madeesh Kannan
d195923164
Set version to 3.7.0.dev0 (#12799) 2023-07-06 18:29:03 +02:00
svlandeg
d26e4e0849 Revert "feat: add example stubs (#12679)"
This reverts commit 30bb34533a.
2023-07-06 17:02:38 +02:00
Basile Dura
30bb34533a
feat: add example stubs (#12679)
* feat: add example stubs

* fix: add required annotations

* fix: mypy issues

* fix: use Py36-compatible Portocol

* Minor reformatting

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2023-07-06 16:49:43 +02:00
Sofie Van Landeghem
536798f9e3
Disallow False for first/last arguments of add_pipe (#12793)
* Literal True for first/last options

* add test case

* update docs

* remove old redundant test case

* black formatting

* use Optional typing in docstrings

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-07-06 15:20:13 +02:00
Adriane Boyd
6fc153a266
Merge pull request #12794 from adrianeboyd/chore/v3.6.0-2
Reenable compat+models tests for v3.6.0
2023-07-06 13:22:21 +02:00
Adriane Boyd
4e19ec7eb8
Docs for v3.6.0 (#12792)
* Docs for v3.6.0

* Add sl performance

* Add da trf note
2023-07-06 12:58:25 +02:00
Adriane Boyd
76329e1dde Revert "Temporarily skip download CLI related tests in CI"
This reverts commit 46ce66021a.
2023-07-06 12:48:06 +02:00
Adriane Boyd
a1191146f5 Revert "Temporarily skip tests for compat table"
This reverts commit dd5e00c735.
2023-07-06 12:47:50 +02:00
Adriane Boyd
830dcca367
SpanFinder: set default max_length to 25 (#12791)
When the default `max_length` is not set and there are longer training
documents, it can be difficult to train and evaluate the span finder due
to memory limits and the time it takes to evaluate a huge number of
predicted spans.
2023-07-06 09:55:34 +02:00
Madeesh Kannan
8113cfb257
Language.replace_listeners: Pass the replaced listener and the tok2vec pipe to the callback (#12785)
* `Language.replace_listeners`: Pass the replaced listener and the `tok2vec` pipe to the callback

* Update developer docs

* `isort` fixes

* Add error message to assertion

* Add clarification to dev docs

* Replace assertion with exception

* Doc fixes
2023-07-05 13:36:04 +02:00
Sofie Van Landeghem
6f3a71999e
Merge pull request #12784 from explosion/master
Merge `master` into `develop`
2023-07-04 15:05:15 +02:00
Tom Aarsen
eab929361d
Use 'exclude' instead of 'disable' (#12783)
as suggested by @svlandeg
2023-07-04 11:45:13 +02:00
Marcus Blättermann
bd239511a4
Fix problem with missing syntax highlighting languages causing runtime crash on the website (#12781)
* Fix problem with universe pages using `docker` language

* Fix problem with universe pages using `r` language

* Add fallback, in case code language is unknown
2023-07-03 10:24:25 +02:00
Daniël de Kok
57a230c6e4
Remove section about parallel training with Ray (#12770)
The Ray integration is currently broken, having these docs around
suggest that this functionality is currently available.
2023-06-28 17:09:57 +02:00
Sofie Van Landeghem
b615964be7
Merge pull request #12752 from danieldk/maintenance/sync-v4-master-20230626
Sync `master` into `v4`
2023-06-28 08:56:54 +01:00
Adriane Boyd
fb0da3e097
Support custom token/lexeme attribute for vectors (#12625)
* Support custom token/lexeme attribute for vectors

* Fix imports

* Back off to ORTH without Vectors.attr

* Fallback if vectors.attr doesn't exist

* Update docs
2023-06-28 09:43:14 +02:00
Adriane Boyd
337a360cc7
Use spans_ prefix for default span finder scores (#12753) 2023-06-27 19:32:17 +02:00
Adriane Boyd
65f6c9cd10
Support overriding registered functions in configs (#12623)
Support overriding registered functions in configs. Previously the registry name was parsed as a section name rather than as a registry name.
2023-06-27 17:36:33 +02:00
Adriane Boyd
c067b5264c
Address issues with source with component names and replacing listeners (#12701)
When sourcing a component, the object from the original pipeline is added to the new pipeline as the same object. This creates a situation where there are several attributes that cannot be in sync between the original pipeline and the new pipeline at the same time for this one object:

* component.name
* component.listener_map / component.listening_components for tok2vec and transformer

When running replace_listeners on a component, the config is not updated correctly if the state of the component is incorrect for the current pipeline (in particular changes that should be applied from model.attrs["replace_listener_cfg"] as used in spacy-transformers) due to the fact that:

* find_listeners relies on component.name to set the name in the listener_map
* replace_listeners relies on listener_map to determine how to modify the configs

In addition, there are several places where pipeline components are modified and the listener map and/or internal component names aren't currently updated.

In cases where there is a component shared by two pipelines that cannot be in sync, this PR chooses to prioritize the most recently modified or initialized pipeline. There is no actual solution with the current source behavior that will make both pipelines usable, so the current pipeline is updated whenever components are added/renamed/removed or the pipeline is initialized for training.
2023-06-27 10:47:07 +02:00
Daniël de Kok
8b2732e276 Fix training.callbacks <-> language import cycle 2023-06-26 12:43:45 +02:00
Daniël de Kok
122f3b32ad Fix span <-> underscore import cycle 2023-06-26 12:43:21 +02:00
Daniël de Kok
bf92ca4f10 Merge remote-tracking branch 'upstream/master' into v4-isort 2023-06-26 12:43:00 +02:00
Daniël de Kok
2468742cb8 isort all the things 2023-06-26 11:41:03 +02:00
Daniël de Kok
68089f65cd Configure isort to use the Black profile, recursively isort the spacy module 2023-06-26 11:40:32 +02:00
Adriane Boyd
e1664217f5
Add spancat_singlelabel to debug data CLI (#12749) 2023-06-26 10:25:20 +02:00
Daniël de Kok
17c4a3d646
Set version to v4.0.0.dev1 (#12748) 2023-06-23 09:43:41 +02:00
Sofie Van Landeghem
95619b6736
Merge pull request #12717 from danieldk/sync-v4-master-20230612
Merge master into v4
2023-06-22 17:44:57 +01:00
Daniël de Kok
096794dd74 Account for differences between Span.sents in spaCy 3/4 2023-06-22 15:38:22 +02:00
Adriane Boyd
cb4fdc83e4
Merge pull request #12742 from adrianeboyd/chore/v3.6.0
Set version to v3.6.0
2023-06-21 15:34:28 +02:00
Adriane Boyd
34971bcbd1 Set version to v3.6.0 2023-06-21 12:59:36 +02:00
Adriane Boyd
dd5e00c735 Temporarily skip tests for compat table 2023-06-21 12:59:36 +02:00