Marcus Blättermann
c23d54fd26
Remove MDX tags from README.md
2022-11-27 03:47:11 +01:00
Adriane Boyd
681ec20914
Add smart_open requirement, update deprecated options ( #11864 )
...
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-11-25 13:00:57 +01:00
Adriane Boyd
32396e0bda
Set version to v3.5.0
2022-11-25 12:05:25 +01:00
Adriane Boyd
378db0eb1e
Temporarily skip tests that require models/compat
2022-11-25 12:05:25 +01:00
Raphael Mitsch
b1d458eca7
Add generate_from_disk() factory method.
2022-11-25 12:02:37 +01:00
Raphael Mitsch
c0fd8a2e71
find-threshold: CLI command for multi-label classifier threshold tuning ( #11280 )
...
* Add foundation for find-threshold CLI functionality.
* Finish first draft for find-threshold.
* Add tests.
* Revert adjusted import statements.
* Fix mypy errors.
* Fix imports.
* Harmonize arguments with spacy evaluate command.
* Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI.
* Fix Spancat test.
* Add beta parameter to Scorer and PRFScore.
* Make beta a component scorer setting.
* Remove beta.
* Update nlp.config (workaround).
* Reload pipeline on threshold change. Adjust tests. Remove confection reference.
* Remove assumption of component being a Pipe object or having a .cfg attribute.
* Adjust test output and reference values.
* Remove beta references. Delete universe.json.
* Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests.
* Update spacy/cli/find_threshold.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Remove adding labels in tests.
* Remove unused error
* Undo changes to PRFScorer
* Change default value for n_trials. Log table iteratively.
* Add warnings for pointless applications of find_threshold().
* Fix imports.
* Adjust type check of TextCategorizer to exclude subclasses.
* Change check of if there's only one unique value in scores.
* Update spacy/cli/find_threshold.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Incorporate feedback.
* Fix test issue. Update docstring.
* Update docs & docstring.
* Update spacy/tests/test_cli.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Add examples to docs. Rename _nlp to nlp in tests.
* Update spacy/cli/find_threshold.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/cli/find_threshold.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-11-25 11:44:55 +01:00
kadarakos
dece775279
correct ndim in docs ( #11869 )
2022-11-25 11:31:28 +01:00
Adriane Boyd
30d31fd335
Update Russian and Ukrainian lemmatizers ( #11811 )
...
* pymorph2 issues #11620 , #11626 , #11625 :
- #11620 : pymorphy2_lookup
- #11626 : handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625 : matching DET that are labelled as PRON by pymorhp2
* Move lemmatizer algorithm changes back into RussianLemmatizer
* Fix uk pymorphy3_lookup mode init
* Move and update tests for ru/uk lookup lemmatizer modes
* Fix typo
* Remove traces of previous behavior for uninflected POS
* Refactor to private generic-looking pymorphy methods
* Remove xfailed uk lemmatizer cases
* Update spacy/lang/ru/lemmatizer.py
Co-authored-by: Richard Hudson <richard@explosion.ai>
Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
2022-11-25 11:12:46 +01:00
Adriane Boyd
8f062b849c
Fix Matcher cython profile=True header ( #11867 )
2022-11-24 16:03:42 +01:00
Raphael Mitsch
4eb072fa91
Add abstract method KnowledgeBase.__len__().
2022-11-23 21:24:17 +01:00
Madeesh Kannan
5ea14af32b
Add training.before_update
callback ( #11739 )
...
* Add `training.before_update` callback
This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.
* Fix type annotation, default config value
* Generalize arguments passed to the callback
* Update schema
* Pass `epoch` to callback, rename `current_step` to `step`
* Add test
* Simplify test
* Replace config string with `spacy.blank`
* Apply suggestions from code review
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Cleanup imports
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-23 17:54:58 +01:00
Paul O'Leary McCann
8271cfb4cd
Remove Learning Path spaCy ( #11846 )
2022-11-23 11:03:18 +01:00
Paul O'Leary McCann
f1ddac187d
Remove unused error object ( #11837 )
2022-11-23 10:51:31 +01:00
Raphael Mitsch
ca915e1ae9
Merge branch 'master' into feature/candidate-generation-by-docs
2022-11-23 09:41:06 +01:00
Marcus Blättermann
ecbf052abd
Remove README.md
content from styleguide
2022-11-23 02:04:54 +01:00
Marcus Blättermann
5659eeaadd
Remove styleguide content from README.md
2022-11-23 02:04:54 +01:00
Marcus Blättermann
8c0ceca637
Move README.md
content to styleguide
2022-11-23 02:04:54 +01:00
Marcus Blättermann
0794e5c6cc
Add missing files to project structure in README.md
2022-11-23 02:04:54 +01:00
Marcus Blättermann
96218a1e8f
Delete styleguide.md
...
This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept
2022-11-23 02:04:54 +01:00
Marcus Blättermann
9d96e44a87
Apply Prettier to README.md
2022-11-23 02:04:49 +01:00
Marco Edward Gorelli
f0d8309a28
fix comparison of constants ( #11834 )
...
Co-authored-by: MarcoGorelli <>
2022-11-21 08:12:03 +01:00
github-actions[bot]
89bfd06fbd
Auto-format code with black ( #11826 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-18 18:24:13 +09:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe ( #11825 )
2022-11-18 08:24:22 +01:00
Adriane Boyd
a83463c5e0
Add transformer recommendation for ca ( #11819 )
...
Model recommendation from @cayorodriguez.
2022-11-18 08:15:27 +01:00
Paul O'Leary McCann
75bb7ad541
Check textcat values for validity ( #11763 )
...
* Check textcat values for validity
* Fix error numbers
* Clean up vals reference
* Check category value validity through training
The _validate_categories is called in update, which for multilabel is
inherited from the single label component.
* Formatting
2022-11-17 10:25:01 +01:00
Raphael Mitsch
1480009715
Make entity_vector_length available in Python.
2022-11-16 16:16:20 +01:00
Raphael Mitsch
aa2b5122b6
Make entity_vector_length available in Python.
2022-11-16 16:07:39 +01:00
Raphael Mitsch
d6d4c45eef
Make entity_vector_length writable.
2022-11-16 15:52:34 +01:00
Adriane Boyd
317b6ef99c
Update to mypy 0.990 ( #11801 )
2022-11-16 14:09:10 +01:00
Paul O'Leary McCann
c0c54e44bc
Add equality definition for vectors ( #11806 )
...
* Add equality definition for vectors
This re-uses the check from sourcing components.
* Use the equality check
* Format
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-16 09:44:42 +01:00
Sofie Van Landeghem
caa9efad59
prevent rewriting an already raw URL ( #11810 )
2022-11-15 14:15:00 +01:00
Denis Bezykornov
7e684ad691
Update russian tokenizer exceptions ( #11753 )
...
* Fix typos, add couple of new abbreviations, remove nonbreaking spaces
* Remove space from abbreviation
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-15 11:37:25 +01:00
Peter Baumgartner
9baa686f82
remove migration support form ( #11802 )
2022-11-14 16:53:14 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs ( #11781 )
...
* Remove spacy ray from cli docs
* Remove more ray docs
* Remove ray from universe
2022-11-14 19:58:38 +09:00
Edward
3478ff1eb0
remove new v2 tags ( #11780 )
2022-11-14 17:41:01 +09:00
github-actions[bot]
188a7d00eb
Auto-format code with black ( #11792 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-11 09:58:31 +01:00
Jacobo Myerston
322b5dc1df
Add greCy to Universe ( #11774 )
...
* Update universe.json
* Update universe.json
fixes Github value
2022-11-10 13:21:20 +09:00
Raphael Mitsch
b572e2473a
Update docstring.
2022-11-09 14:31:22 +01:00
Raphael Mitsch
c5b15e0e04
Update docstring.
2022-11-09 14:31:08 +01:00
Adriane Boyd
03eebe9d1c
Update warning, add tests for project requirements check ( #11777 )
...
* Update warning, add tests for project requirements check
* Make warning more general for differences between PEP 508 and pip
* Add tests for _check_requirements
* Parameterize test
2022-11-09 10:59:28 +01:00
Raphael Mitsch
20bbbe3e44
Revert disable/disabled merging behavior ( #11745 )
...
* Merge disable with disabled. Adjust warnings, errors and tests.
* Replace any() with set operation.
* Update spacy/tests/pipeline/test_pipe_methods.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update docs.
* Remve reference to config entry nlp.enabled from docs.
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-08 14:58:10 +01:00
Adriane Boyd
2e3cfd758e
Use python 3.10 for GHA universe alert ( #11768 )
2022-11-08 12:46:19 +09:00
Adriane Boyd
e116395f89
Add fallback in requirements check, only check once ( #11735 )
...
* Add fallback in requirements check, only check once
* Rename to skip_requirements_check
* Update spacy/cli/project/run.py
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-11-07 14:46:08 +01:00
Adriane Boyd
6105f20d8a
Switch CI to python 3.11 ( #11765 )
2022-11-07 13:25:40 +01:00
Adriane Boyd
e91b47a226
Check for unsafe paths in tarfile.extractall (CVE-2007-4559) ( #11746 )
...
* Adding tarfile member sanitization to extractall()
* Format
* Simplify and add error message
* Fix import
* Add comment about CVE
Co-authored-by: TrellixVulnTeam <charles.mcfarland@trellix.com>
2022-11-07 10:43:34 +01:00
Paul O'Leary McCann
b76222e56a
Raise Typer limit ( #11720 )
...
* Raise typer limit to <0.7.0
* Raise limit to <0.8.0
2022-11-07 08:11:55 +01:00
Adriane Boyd
ea326cf47d
Fix types for Span.id and Span.id_ ( #11744 )
2022-11-07 08:11:13 +01:00
Raphael Mitsch
b398cca5cc
Replace leftover Generator typing with Iterator.
2022-11-04 12:46:03 +01:00
Raphael Mitsch
7a4ef51807
Merge branch 'explosion:master' into feature/candidate-generation-by-docs
2022-11-04 12:25:25 +01:00
github-actions[bot]
bbf64cfc43
Auto-format code with black ( #11749 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-04 11:17:43 +01:00