Commit Graph

15726 Commits

Author SHA1 Message Date
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. (#11901)
* Add docs for biluo_to_iob and iob_to_biluo.

* Fix typos.

* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table (#11908)
* Fix typo in vocab.md table

Fixes explosion/spaCy/#11907

* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Adriane Boyd
6f9d630f7e
Replace Pipe type with Callable in Language (#11803)
* Replace Pipe type with Callable in Language

* Use Callable[[Doc], Doc] in the docstrings
2022-11-29 13:20:08 +01:00
Paul O'Leary McCann
f1e0243450
Remove macro auc per type from textcat defaults (#11887)
This appears to have been added by mistake and never used. Removing it
does not break validation.
2022-11-29 11:50:23 +01:00
Adriane Boyd
e0d43557b7
Merge pull request #11871 from adrianeboyd/chore/v3.5.0
Prepare for v3.5.0
2022-11-29 11:41:32 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Sofie Van Landeghem
96c9cf3448
Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme
Move Styleguide out of Readme
2022-11-28 21:22:56 +01:00
Paul O'Leary McCann
f54bfb56c9
Don't throw an error if using displacy on an unset span key (#11845)
* Don't throw an error if using displacy on an unset span key

* List available keys in W117
2022-11-28 10:01:09 +01:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Marcus Blättermann
5c9faf6eea
Update menu for styleguide
This reflects the removed parts from ecbf052abd
2022-11-27 03:48:05 +01:00
Marcus Blättermann
90141202c0
Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17 2022-11-27 03:48:03 +01:00
Marcus Blättermann
7f2ea20fee
Update README.md 2022-11-27 03:47:11 +01:00
Marcus Blättermann
c23d54fd26
Remove MDX tags from README.md 2022-11-27 03:47:11 +01:00
Adriane Boyd
681ec20914
Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-11-25 13:00:57 +01:00
Adriane Boyd
32396e0bda Set version to v3.5.0 2022-11-25 12:05:25 +01:00
Adriane Boyd
378db0eb1e Temporarily skip tests that require models/compat 2022-11-25 12:05:25 +01:00
Raphael Mitsch
c0fd8a2e71
find-threshold: CLI command for multi-label classifier threshold tuning (#11280)
* Add foundation for find-threshold CLI functionality.

* Finish first draft for find-threshold.

* Add tests.

* Revert adjusted import statements.

* Fix mypy errors.

* Fix imports.

* Harmonize arguments with spacy evaluate command.

* Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI.

* Fix Spancat test.

* Add beta parameter to Scorer and PRFScore.

* Make beta a component scorer setting.

* Remove beta.

* Update nlp.config (workaround).

* Reload pipeline on threshold change. Adjust tests. Remove confection reference.

* Remove assumption of component being a Pipe object or having a .cfg attribute.

* Adjust test output and reference values.

* Remove beta references. Delete universe.json.

* Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Remove adding labels in tests.

* Remove unused error

* Undo changes to PRFScorer

* Change default value for n_trials. Log table iteratively.

* Add warnings for pointless applications of find_threshold().

* Fix imports.

* Adjust type check of TextCategorizer to exclude subclasses.

* Change check of if there's only one unique value in scores.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Fix test issue. Update docstring.

* Update docs & docstring.

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add examples to docs. Rename _nlp to nlp in tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-11-25 11:44:55 +01:00
kadarakos
dece775279
correct ndim in docs (#11869) 2022-11-25 11:31:28 +01:00
Adriane Boyd
30d31fd335
Update Russian and Ukrainian lemmatizers (#11811)
* pymorph2 issues #11620, #11626, #11625:
- #11620: pymorphy2_lookup
- #11626: handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625: matching DET that are labelled as PRON by pymorhp2

* Move lemmatizer algorithm changes back into RussianLemmatizer

* Fix uk pymorphy3_lookup mode init

* Move and update tests for ru/uk lookup lemmatizer modes

* Fix typo

* Remove traces of previous behavior for uninflected POS

* Refactor to private generic-looking pymorphy methods

* Remove xfailed uk lemmatizer cases

* Update spacy/lang/ru/lemmatizer.py

Co-authored-by: Richard Hudson <richard@explosion.ai>

Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
2022-11-25 11:12:46 +01:00
Adriane Boyd
8f062b849c
Fix Matcher cython profile=True header (#11867) 2022-11-24 16:03:42 +01:00
Madeesh Kannan
5ea14af32b
Add training.before_update callback (#11739)
* Add `training.before_update` callback

This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.

* Fix type annotation, default config value

* Generalize arguments passed to the callback

* Update schema

* Pass `epoch` to callback, rename `current_step` to `step`

* Add test

* Simplify test

* Replace config string with `spacy.blank`

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Cleanup imports

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-23 17:54:58 +01:00
Paul O'Leary McCann
8271cfb4cd
Remove Learning Path spaCy (#11846) 2022-11-23 11:03:18 +01:00
Paul O'Leary McCann
f1ddac187d
Remove unused error object (#11837) 2022-11-23 10:51:31 +01:00
Marcus Blättermann
ecbf052abd
Remove README.md content from styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
5659eeaadd
Remove styleguide content from README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
8c0ceca637
Move README.md content to styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
0794e5c6cc
Add missing files to project structure in README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
96218a1e8f
Delete styleguide.md
This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept
2022-11-23 02:04:54 +01:00
Marcus Blättermann
9d96e44a87
Apply Prettier to README.md 2022-11-23 02:04:49 +01:00
Marco Edward Gorelli
f0d8309a28
fix comparison of constants (#11834)
Co-authored-by: MarcoGorelli <>
2022-11-21 08:12:03 +01:00
github-actions[bot]
89bfd06fbd
Auto-format code with black (#11826)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-18 18:24:13 +09:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe (#11825) 2022-11-18 08:24:22 +01:00
Adriane Boyd
a83463c5e0
Add transformer recommendation for ca (#11819)
Model recommendation from @cayorodriguez.
2022-11-18 08:15:27 +01:00
Paul O'Leary McCann
75bb7ad541
Check textcat values for validity (#11763)
* Check textcat values for validity

* Fix error numbers

* Clean up vals reference

* Check category value validity through training

The _validate_categories is called in update, which for multilabel is
inherited from the single label component.

* Formatting
2022-11-17 10:25:01 +01:00
Adriane Boyd
317b6ef99c
Update to mypy 0.990 (#11801) 2022-11-16 14:09:10 +01:00
Paul O'Leary McCann
c0c54e44bc
Add equality definition for vectors (#11806)
* Add equality definition for vectors

This re-uses the check from sourcing components.

* Use the equality check

* Format

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-16 09:44:42 +01:00
Sofie Van Landeghem
caa9efad59
prevent rewriting an already raw URL (#11810) 2022-11-15 14:15:00 +01:00
Denis Bezykornov
7e684ad691
Update russian tokenizer exceptions (#11753)
* Fix typos, add couple of new abbreviations, remove nonbreaking spaces

* Remove space from abbreviation

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-15 11:37:25 +01:00
Peter Baumgartner
9baa686f82
remove migration support form (#11802) 2022-11-14 16:53:14 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
2022-11-14 19:58:38 +09:00
Edward
3478ff1eb0
remove new v2 tags (#11780) 2022-11-14 17:41:01 +09:00
github-actions[bot]
188a7d00eb
Auto-format code with black (#11792)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-11 09:58:31 +01:00
Jacobo Myerston
322b5dc1df
Add greCy to Universe (#11774)
* Update universe.json

* Update universe.json

fixes Github value
2022-11-10 13:21:20 +09:00
Adriane Boyd
03eebe9d1c
Update warning, add tests for project requirements check (#11777)
* Update warning, add tests for project requirements check

* Make warning more general for differences between PEP 508 and pip
* Add tests for _check_requirements

* Parameterize test
2022-11-09 10:59:28 +01:00
Raphael Mitsch
20bbbe3e44
Revert disable/disabled merging behavior (#11745)
* Merge disable with disabled. Adjust warnings, errors and tests.

* Replace any() with set operation.

* Update spacy/tests/pipeline/test_pipe_methods.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Remve reference to config entry nlp.enabled from docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-08 14:58:10 +01:00
Adriane Boyd
2e3cfd758e
Use python 3.10 for GHA universe alert (#11768) 2022-11-08 12:46:19 +09:00
Adriane Boyd
e116395f89
Add fallback in requirements check, only check once (#11735)
* Add fallback in requirements check, only check once

* Rename to skip_requirements_check

* Update spacy/cli/project/run.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-11-07 14:46:08 +01:00
Adriane Boyd
6105f20d8a
Switch CI to python 3.11 (#11765) 2022-11-07 13:25:40 +01:00
Adriane Boyd
e91b47a226
Check for unsafe paths in tarfile.extractall (CVE-2007-4559) (#11746)
* Adding tarfile member sanitization to extractall()

* Format

* Simplify and add error message

* Fix import

* Add comment about CVE

Co-authored-by: TrellixVulnTeam <charles.mcfarland@trellix.com>
2022-11-07 10:43:34 +01:00
Paul O'Leary McCann
b76222e56a
Raise Typer limit (#11720)
* Raise typer limit to <0.7.0

* Raise limit to <0.8.0
2022-11-07 08:11:55 +01:00