Commit Graph

15810 Commits

Author SHA1 Message Date
Sofie Van Landeghem
6d03b04901
Improve score_cats for use with multiple textcat components (#11820)
* add test for running evaluate on an nlp pipeline with two distinct textcat components

* cleanup

* merge dicts instead of overwrite

* don't add more labels to the given set

* Revert "merge dicts instead of overwrite"

This reverts commit 89bee0ed77.

* Switch tests to separate scorer keys rather than merged dicts

* Revert unrelated edits

* Switch textcat scorers to v2

* formatting

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-01-09 11:43:48 +01:00
Madeesh Kannan
f1dcdefc8a
Add version tag to before_update config key (#12059) 2023-01-05 11:46:04 +01:00
Sofie Van Landeghem
7f6c638c3a
fix processing of "auto" in convert (#12050)
* fix processing of "auto" in walk_directory

* add check for None

* move AUTO check to convert and fix verification of args

* add specific CLI test with CliRunner

* cleanup

* more cleanup

* update docstring
2023-01-05 10:21:00 +01:00
Paul O'Leary McCann
dbd829f0ed
Fix inconsistency in displaCy docs about page option (#12047)
* Fix inconsistency in displaCy docs about page option

The `page` option, which wraps the output SVG in HTML, is true by
default for `serve` but not for `render`. The `render` docs were wrong
though, so this updates them.

* Update the same statement in more docs

A few renderers used the same language
2023-01-04 12:51:40 +09:00
Wannaphong Phatthiyaphaibun
31c1beba78
Add spacy-pythainlp (#12038)
* Add spacy-pythainlp

* Move submission to right section

* Minor cleanup

* Remove extra list call

* Update universe.json

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2023-01-03 17:03:59 +09:00
github-actions[bot]
abb0ab109d
Auto-format code with black (#12035)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2023-01-02 11:59:57 +01:00
Adriane Boyd
ef9e504eac
Rename modified textcat scorer to v2 (#11971)
As a follow-up to #11696, rename the modified scorer to v2 and move the
v1 scorer to `spacy-legacy`.
2022-12-29 14:01:08 +01:00
kadarakos
933b54ac79
typo fix (#11995) 2022-12-26 13:26:35 +01:00
Madeesh Kannan
aa2b471a6e
New console logger with expanded progress tracking (#11972)
* Add `ConsoleLogger.v3`

This addition expands the progress bar feature to count up the training/distillation steps to either the next evaluation pass or the maximum number of steps.

* Rename progress bar types

* Add defaults to docs
Minor fixes

* Move comment

* Minor punctuation fixes

* Explicitly check for `None` when validating progress bar type

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-23 15:21:44 +01:00
github-actions[bot]
90896504a5
Auto-format code with black (#12019)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-23 12:44:07 +01:00
Adriane Boyd
64d2d27c5d
Add classifier for python 3.11 (#12013) 2022-12-22 10:53:16 +01:00
Raphael Mitsch
eef3d950b4
Fix SpanGroup and Span typing (#12009)
* Correct Span.label, Span.kb_id types. Fix SpanGroup.__iter__().

* Extend test.

* Rename test. Fix typo.

* Add comment.

* Fix types for Span.label, Span.kb_id, Span.char_span().

* Update spacy/tests/doc/test_span_group.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Fix typo.

* Update spacy/tokens/span_group.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-12-21 18:54:27 +01:00
kadarakos
c223cd7a86
Add apply CLI (#11376)
* annotate cli first try

* add batch-size and n_process

* rename to apply

* typing fix

* handle file suffixes

* walk directories

* support jsonl

* typing fix

* remove debug

* make suffix optional for walk

* revert unrelated

* don't warn but raise

* better error message

* minor touch up

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* update tests and bugfix

* add force_overwrite

* typo

* fix adding .spacy suffix

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* store user data and rename cmd arg

* include test for user attr

* rename cmd arg

* better help message

* documentation

* prettier

* black

* link fix

* Update spacy/cli/apply.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* addressing reviews

* dont quit but warn

* prettier

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-20 17:11:33 +01:00
Jos Polfliet
18ffe5bbd6
Update stop_words.py (#11997)
fix typo in "aangaande"
2022-12-19 16:17:49 +01:00
cfuerbachersparks
3a2b655a29
Update lexeme.md (#11994)
Change suffix_ string to end
2022-12-19 10:33:38 +01:00
Adriane Boyd
c9d9d6847f
Update build constraints for python 3.11 (#11981) 2022-12-15 10:55:01 +01:00
Adriane Boyd
e5c7f3b077
CI: Install thinc-apple-ops through extra (#11963) 2022-12-12 10:13:10 +01:00
Adriane Boyd
0591e67265
Cast to uint64 for all array-based doc representations (#11933)
* Convert all individual values explicitly to uint64 for array-based doc representations

* Temporarily test with latest numpy v1.24.0rc

* Remove unnecessary conversion from attr_t

* Reduce number of individual casts

* Convert specifically from int32 to uint64

* Revert "Temporarily test with latest numpy v1.24.0rc"

This reverts commit eb0e3c5006.

* Also use int32 in tests
2022-12-12 08:45:35 +01:00
Adriane Boyd
8c291ace0c
Extend to wasabi v1.1 (#11945)
* Extend to wasabi v1.1

* Temporarily run mypy and tests with newest wasabi

* Temporarily skip check requirements test

* Revert "Temporarily skip check requirements test"

This reverts commit 44f4ce20a8.

* Revert "Temporarily run mypy and tests with newest wasabi"

This reverts commit e677a2257c.
2022-12-12 08:38:36 +01:00
github-actions[bot]
f22fc7a113
Auto-format code with black (#11955)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-09 10:15:52 +01:00
vincent d warmerdam
6d2ca1ab3a
Update custom solutions links (#11903)
* Update custom solutions

Will now point to https://explosion.ai/custom-solutions

* added-sidebar

* added-analysis-to-readme

* update-landing-page
2022-12-07 16:02:09 +01:00
Paul O'Leary McCann
73919336fb
Remove spacy-sentence-segmenter from Universe (#11932) 2022-12-07 15:56:03 +01:00
Paul O'Leary McCann
5c3a60e8f4
Add in errors used in the beam code that were removed at some point (#11935)
I don't think there's any way to use the beam code at the moment, but as
long as it's around the errors it refers to should also be present.
2022-12-07 15:52:35 +01:00
Paul O'Leary McCann
916191848a
Update scattertext example code (#11937)
* Update scattertext example code

* Remove PMI Filter Threshold
2022-12-07 18:09:04 +09:00
Daniël de Kok
27fac7df2e
EditTreeLemmatizer: correctly add strings when initializing from labels (#11934)
Strings in replacement nodes where not added to the `StringStore`
when `EditTreeLemmatizer` was initialized from a set of labels. The
corresponding test did not capture this because it added the strings
through the examples that were passed to the initialization.

This change fixes both this bug in the initialization as the 'shadowing'
of the bug in the test.
2022-12-07 13:53:41 +09:00
Zhangrp
23085ffef4
Fix interpolation in directory names, see #11235. (#11914) 2022-12-06 17:42:12 +09:00
Ryn Daniels
1aadcfcb37
update lock-threads to v4 (#11930) 2022-12-05 10:17:10 +01:00
Adriane Boyd
8afa8b5a7b
Refactor kwargs in CLI msg for future wasabi compatibility (#11918)
Necessary for mypy with wasabi v1+.
2022-12-05 10:00:00 +01:00
Darigov Research
6f342bdd72
docs: Adds link to license in readme (#11924)
Would resolve https://github.com/explosion/spaCy/issues/11923 if merged
2022-12-05 09:49:04 +01:00
Paul O'Leary McCann
5848656b5e
Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6
2022-12-05 09:43:23 +01:00
Sofie Van Landeghem
4b2097a271
fix links (#11927) 2022-12-05 16:29:13 +09:00
github-actions[bot]
df0cb4b77b
Auto-format code with black (#11913)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-02 14:49:12 +01:00
Paul O'Leary McCann
f9d17a644b
Config generation fails for GPU without transformers (#11899)
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.

There may be a better way to do this, but this stops the error.
2022-12-02 10:17:11 +01:00
Adriane Boyd
445c670a2d
Fix spancat for zero suggestions (#11860)
* Add test for spancat predict with zero suggestions

* Fix spancat for zero suggestions

* Undo changes to extract_spans

* Use .sum() as in update
2022-12-02 09:33:52 +01:00
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. (#11901)
* Add docs for biluo_to_iob and iob_to_biluo.

* Fix typos.

* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table (#11908)
* Fix typo in vocab.md table

Fixes explosion/spaCy/#11907

* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Adriane Boyd
6f9d630f7e
Replace Pipe type with Callable in Language (#11803)
* Replace Pipe type with Callable in Language

* Use Callable[[Doc], Doc] in the docstrings
2022-11-29 13:20:08 +01:00
Paul O'Leary McCann
f1e0243450
Remove macro auc per type from textcat defaults (#11887)
This appears to have been added by mistake and never used. Removing it
does not break validation.
2022-11-29 11:50:23 +01:00
Adriane Boyd
e0d43557b7
Merge pull request #11871 from adrianeboyd/chore/v3.5.0
Prepare for v3.5.0
2022-11-29 11:41:32 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Sofie Van Landeghem
96c9cf3448
Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme
Move Styleguide out of Readme
2022-11-28 21:22:56 +01:00
Paul O'Leary McCann
f54bfb56c9
Don't throw an error if using displacy on an unset span key (#11845)
* Don't throw an error if using displacy on an unset span key

* List available keys in W117
2022-11-28 10:01:09 +01:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Marcus Blättermann
5c9faf6eea
Update menu for styleguide
This reflects the removed parts from ecbf052abd
2022-11-27 03:48:05 +01:00
Marcus Blättermann
90141202c0
Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17 2022-11-27 03:48:03 +01:00
Marcus Blättermann
7f2ea20fee
Update README.md 2022-11-27 03:47:11 +01:00
Marcus Blättermann
c23d54fd26
Remove MDX tags from README.md 2022-11-27 03:47:11 +01:00
Adriane Boyd
681ec20914
Add smart_open requirement, update deprecated options (#11864)
* Switch from deprecated `ignore_ext` to `compression`
* Add upload/download test for local files
2022-11-25 13:00:57 +01:00
Adriane Boyd
32396e0bda Set version to v3.5.0 2022-11-25 12:05:25 +01:00
Adriane Boyd
378db0eb1e Temporarily skip tests that require models/compat 2022-11-25 12:05:25 +01:00