Commit Graph

15836 Commits

Author SHA1 Message Date
Daniël de Kok
27fac7df2e
EditTreeLemmatizer: correctly add strings when initializing from labels (#11934)
Strings in replacement nodes where not added to the `StringStore`
when `EditTreeLemmatizer` was initialized from a set of labels. The
corresponding test did not capture this because it added the strings
through the examples that were passed to the initialization.

This change fixes both this bug in the initialization as the 'shadowing'
of the bug in the test.
2022-12-07 13:53:41 +09:00
Zhangrp
23085ffef4
Fix interpolation in directory names, see #11235. (#11914) 2022-12-06 17:42:12 +09:00
Ryn Daniels
1aadcfcb37
update lock-threads to v4 (#11930) 2022-12-05 10:17:10 +01:00
Adriane Boyd
8afa8b5a7b
Refactor kwargs in CLI msg for future wasabi compatibility (#11918)
Necessary for mypy with wasabi v1+.
2022-12-05 10:00:00 +01:00
Darigov Research
6f342bdd72
docs: Adds link to license in readme (#11924)
Would resolve https://github.com/explosion/spaCy/issues/11923 if merged
2022-12-05 09:49:04 +01:00
Paul O'Leary McCann
5848656b5e
Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6
2022-12-05 09:43:23 +01:00
Sofie Van Landeghem
4b2097a271
fix links (#11927) 2022-12-05 16:29:13 +09:00
github-actions[bot]
df0cb4b77b
Auto-format code with black (#11913)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-02 14:49:12 +01:00
Paul O'Leary McCann
f9d17a644b
Config generation fails for GPU without transformers (#11899)
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.

There may be a better way to do this, but this stops the error.
2022-12-02 10:17:11 +01:00
Adriane Boyd
445c670a2d
Fix spancat for zero suggestions (#11860)
* Add test for spancat predict with zero suggestions

* Fix spancat for zero suggestions

* Undo changes to extract_spans

* Use .sum() as in update
2022-12-02 09:33:52 +01:00
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. (#11901)
* Add docs for biluo_to_iob and iob_to_biluo.

* Fix typos.

* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table (#11908)
* Fix typo in vocab.md table

Fixes explosion/spaCy/#11907

* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Adriane Boyd
6f9d630f7e
Replace Pipe type with Callable in Language (#11803)
* Replace Pipe type with Callable in Language

* Use Callable[[Doc], Doc] in the docstrings
2022-11-29 13:20:08 +01:00
Paul O'Leary McCann
f1e0243450
Remove macro auc per type from textcat defaults (#11887)
This appears to have been added by mistake and never used. Removing it
does not break validation.
2022-11-29 11:50:23 +01:00
Adriane Boyd
e0d43557b7
Merge pull request #11871 from adrianeboyd/chore/v3.5.0
Prepare for v3.5.0
2022-11-29 11:41:32 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Sofie Van Landeghem
96c9cf3448
Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme
Move Styleguide out of Readme
2022-11-28 21:22:56 +01:00
Paul O'Leary McCann
f54bfb56c9
Don't throw an error if using displacy on an unset span key (#11845)
* Don't throw an error if using displacy on an unset span key

* List available keys in W117
2022-11-28 10:01:09 +01:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Marcus Blättermann
5c9faf6eea
Update menu for styleguide
This reflects the removed parts from ecbf052abd
2022-11-27 03:48:05 +01:00
Marcus Blättermann
90141202c0
Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17 2022-11-27 03:48:03 +01:00
Marcus Blättermann
7f2ea20fee
Update README.md 2022-11-27 03:47:11 +01:00
Marcus Blättermann
c23d54fd26
Remove MDX tags from README.md 2022-11-27 03:47:11 +01:00
Adriane Boyd
681ec20914
Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-11-25 13:00:57 +01:00
Adriane Boyd
32396e0bda Set version to v3.5.0 2022-11-25 12:05:25 +01:00
Adriane Boyd
378db0eb1e Temporarily skip tests that require models/compat 2022-11-25 12:05:25 +01:00
Raphael Mitsch
c0fd8a2e71
find-threshold: CLI command for multi-label classifier threshold tuning (#11280)
* Add foundation for find-threshold CLI functionality.

* Finish first draft for find-threshold.

* Add tests.

* Revert adjusted import statements.

* Fix mypy errors.

* Fix imports.

* Harmonize arguments with spacy evaluate command.

* Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI.

* Fix Spancat test.

* Add beta parameter to Scorer and PRFScore.

* Make beta a component scorer setting.

* Remove beta.

* Update nlp.config (workaround).

* Reload pipeline on threshold change. Adjust tests. Remove confection reference.

* Remove assumption of component being a Pipe object or having a .cfg attribute.

* Adjust test output and reference values.

* Remove beta references. Delete universe.json.

* Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Remove adding labels in tests.

* Remove unused error

* Undo changes to PRFScorer

* Change default value for n_trials. Log table iteratively.

* Add warnings for pointless applications of find_threshold().

* Fix imports.

* Adjust type check of TextCategorizer to exclude subclasses.

* Change check of if there's only one unique value in scores.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Fix test issue. Update docstring.

* Update docs & docstring.

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add examples to docs. Rename _nlp to nlp in tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-11-25 11:44:55 +01:00
kadarakos
dece775279
correct ndim in docs (#11869) 2022-11-25 11:31:28 +01:00
Adriane Boyd
30d31fd335
Update Russian and Ukrainian lemmatizers (#11811)
* pymorph2 issues #11620, #11626, #11625:
- #11620: pymorphy2_lookup
- #11626: handle multiple forms pointing to the same normal form + handling empty POS tag
- #11625: matching DET that are labelled as PRON by pymorhp2

* Move lemmatizer algorithm changes back into RussianLemmatizer

* Fix uk pymorphy3_lookup mode init

* Move and update tests for ru/uk lookup lemmatizer modes

* Fix typo

* Remove traces of previous behavior for uninflected POS

* Refactor to private generic-looking pymorphy methods

* Remove xfailed uk lemmatizer cases

* Update spacy/lang/ru/lemmatizer.py

Co-authored-by: Richard Hudson <richard@explosion.ai>

Co-authored-by: Dmytro S Lituiev <d.lituiev@gmail.com>
Co-authored-by: Richard Hudson <richard@explosion.ai>
2022-11-25 11:12:46 +01:00
Adriane Boyd
8f062b849c
Fix Matcher cython profile=True header (#11867) 2022-11-24 16:03:42 +01:00
Madeesh Kannan
5ea14af32b
Add training.before_update callback (#11739)
* Add `training.before_update` callback

This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.

* Fix type annotation, default config value

* Generalize arguments passed to the callback

* Update schema

* Pass `epoch` to callback, rename `current_step` to `step`

* Add test

* Simplify test

* Replace config string with `spacy.blank`

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Cleanup imports

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-23 17:54:58 +01:00
Paul O'Leary McCann
8271cfb4cd
Remove Learning Path spaCy (#11846) 2022-11-23 11:03:18 +01:00
Paul O'Leary McCann
f1ddac187d
Remove unused error object (#11837) 2022-11-23 10:51:31 +01:00
Marcus Blättermann
ecbf052abd
Remove README.md content from styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
5659eeaadd
Remove styleguide content from README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
8c0ceca637
Move README.md content to styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
0794e5c6cc
Add missing files to project structure in README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
96218a1e8f
Delete styleguide.md
This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept
2022-11-23 02:04:54 +01:00
Marcus Blättermann
9d96e44a87
Apply Prettier to README.md 2022-11-23 02:04:49 +01:00
Marco Edward Gorelli
f0d8309a28
fix comparison of constants (#11834)
Co-authored-by: MarcoGorelli <>
2022-11-21 08:12:03 +01:00
github-actions[bot]
89bfd06fbd
Auto-format code with black (#11826)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-11-18 18:24:13 +09:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe (#11825) 2022-11-18 08:24:22 +01:00
Adriane Boyd
a83463c5e0
Add transformer recommendation for ca (#11819)
Model recommendation from @cayorodriguez.
2022-11-18 08:15:27 +01:00
Paul O'Leary McCann
75bb7ad541
Check textcat values for validity (#11763)
* Check textcat values for validity

* Fix error numbers

* Clean up vals reference

* Check category value validity through training

The _validate_categories is called in update, which for multilabel is
inherited from the single label component.

* Formatting
2022-11-17 10:25:01 +01:00
Adriane Boyd
317b6ef99c
Update to mypy 0.990 (#11801) 2022-11-16 14:09:10 +01:00
Paul O'Leary McCann
c0c54e44bc
Add equality definition for vectors (#11806)
* Add equality definition for vectors

This re-uses the check from sourcing components.

* Use the equality check

* Format

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-16 09:44:42 +01:00
Sofie Van Landeghem
caa9efad59
prevent rewriting an already raw URL (#11810) 2022-11-15 14:15:00 +01:00
Denis Bezykornov
7e684ad691
Update russian tokenizer exceptions (#11753)
* Fix typos, add couple of new abbreviations, remove nonbreaking spaces

* Remove space from abbreviation

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-15 11:37:25 +01:00
Peter Baumgartner
9baa686f82
remove migration support form (#11802) 2022-11-14 16:53:14 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
2022-11-14 19:58:38 +09:00