Commit Graph

2994 Commits

Author SHA1 Message Date
Wannaphong Phatthiyaphaibun
31c1beba78
Add spacy-pythainlp (#12038)
* Add spacy-pythainlp

* Move submission to right section

* Minor cleanup

* Remove extra list call

* Update universe.json

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2023-01-03 17:03:59 +09:00
Madeesh Kannan
aa2b471a6e
New console logger with expanded progress tracking (#11972)
* Add `ConsoleLogger.v3`

This addition expands the progress bar feature to count up the training/distillation steps to either the next evaluation pass or the maximum number of steps.

* Rename progress bar types

* Add defaults to docs
Minor fixes

* Move comment

* Minor punctuation fixes

* Explicitly check for `None` when validating progress bar type

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-23 15:21:44 +01:00
Raphael Mitsch
eef3d950b4
Fix SpanGroup and Span typing (#12009)
* Correct Span.label, Span.kb_id types. Fix SpanGroup.__iter__().

* Extend test.

* Rename test. Fix typo.

* Add comment.

* Fix types for Span.label, Span.kb_id, Span.char_span().

* Update spacy/tests/doc/test_span_group.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Fix typo.

* Update spacy/tokens/span_group.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-12-21 18:54:27 +01:00
kadarakos
c223cd7a86
Add apply CLI (#11376)
* annotate cli first try

* add batch-size and n_process

* rename to apply

* typing fix

* handle file suffixes

* walk directories

* support jsonl

* typing fix

* remove debug

* make suffix optional for walk

* revert unrelated

* don't warn but raise

* better error message

* minor touch up

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* update tests and bugfix

* add force_overwrite

* typo

* fix adding .spacy suffix

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* store user data and rename cmd arg

* include test for user attr

* rename cmd arg

* better help message

* documentation

* prettier

* black

* link fix

* Update spacy/cli/apply.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* addressing reviews

* dont quit but warn

* prettier

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-20 17:11:33 +01:00
cfuerbachersparks
3a2b655a29
Update lexeme.md (#11994)
Change suffix_ string to end
2022-12-19 10:33:38 +01:00
vincent d warmerdam
6d2ca1ab3a
Update custom solutions links (#11903)
* Update custom solutions

Will now point to https://explosion.ai/custom-solutions

* added-sidebar

* added-analysis-to-readme

* update-landing-page
2022-12-07 16:02:09 +01:00
Paul O'Leary McCann
73919336fb
Remove spacy-sentence-segmenter from Universe (#11932) 2022-12-07 15:56:03 +01:00
Paul O'Leary McCann
916191848a
Update scattertext example code (#11937)
* Update scattertext example code

* Remove PMI Filter Threshold
2022-12-07 18:09:04 +09:00
Sofie Van Landeghem
4b2097a271
fix links (#11927) 2022-12-05 16:29:13 +09:00
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. (#11901)
* Add docs for biluo_to_iob and iob_to_biluo.

* Fix typos.

* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table (#11908)
* Fix typo in vocab.md table

Fixes explosion/spaCy/#11907

* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Sofie Van Landeghem
96c9cf3448
Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme
Move Styleguide out of Readme
2022-11-28 21:22:56 +01:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Marcus Blättermann
5c9faf6eea
Update menu for styleguide
This reflects the removed parts from ecbf052abd
2022-11-27 03:48:05 +01:00
Marcus Blättermann
90141202c0
Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17 2022-11-27 03:48:03 +01:00
Marcus Blättermann
7f2ea20fee
Update README.md 2022-11-27 03:47:11 +01:00
Marcus Blättermann
c23d54fd26
Remove MDX tags from README.md 2022-11-27 03:47:11 +01:00
Raphael Mitsch
c0fd8a2e71
find-threshold: CLI command for multi-label classifier threshold tuning (#11280)
* Add foundation for find-threshold CLI functionality.

* Finish first draft for find-threshold.

* Add tests.

* Revert adjusted import statements.

* Fix mypy errors.

* Fix imports.

* Harmonize arguments with spacy evaluate command.

* Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI.

* Fix Spancat test.

* Add beta parameter to Scorer and PRFScore.

* Make beta a component scorer setting.

* Remove beta.

* Update nlp.config (workaround).

* Reload pipeline on threshold change. Adjust tests. Remove confection reference.

* Remove assumption of component being a Pipe object or having a .cfg attribute.

* Adjust test output and reference values.

* Remove beta references. Delete universe.json.

* Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Remove adding labels in tests.

* Remove unused error

* Undo changes to PRFScorer

* Change default value for n_trials. Log table iteratively.

* Add warnings for pointless applications of find_threshold().

* Fix imports.

* Adjust type check of TextCategorizer to exclude subclasses.

* Change check of if there's only one unique value in scores.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Fix test issue. Update docstring.

* Update docs & docstring.

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add examples to docs. Rename _nlp to nlp in tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-11-25 11:44:55 +01:00
kadarakos
dece775279
correct ndim in docs (#11869) 2022-11-25 11:31:28 +01:00
Madeesh Kannan
5ea14af32b
Add training.before_update callback (#11739)
* Add `training.before_update` callback

This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.

* Fix type annotation, default config value

* Generalize arguments passed to the callback

* Update schema

* Pass `epoch` to callback, rename `current_step` to `step`

* Add test

* Simplify test

* Replace config string with `spacy.blank`

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Cleanup imports

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-23 17:54:58 +01:00
Paul O'Leary McCann
8271cfb4cd
Remove Learning Path spaCy (#11846) 2022-11-23 11:03:18 +01:00
Marcus Blättermann
ecbf052abd
Remove README.md content from styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
5659eeaadd
Remove styleguide content from README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
8c0ceca637
Move README.md content to styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
0794e5c6cc
Add missing files to project structure in README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
96218a1e8f
Delete styleguide.md
This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept
2022-11-23 02:04:54 +01:00
Marcus Blättermann
9d96e44a87
Apply Prettier to README.md 2022-11-23 02:04:49 +01:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe (#11825) 2022-11-18 08:24:22 +01:00
Peter Baumgartner
9baa686f82
remove migration support form (#11802) 2022-11-14 16:53:14 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
2022-11-14 19:58:38 +09:00
Edward
3478ff1eb0
remove new v2 tags (#11780) 2022-11-14 17:41:01 +09:00
Jacobo Myerston
322b5dc1df
Add greCy to Universe (#11774)
* Update universe.json

* Update universe.json

fixes Github value
2022-11-10 13:21:20 +09:00
Raphael Mitsch
20bbbe3e44
Revert disable/disabled merging behavior (#11745)
* Merge disable with disabled. Adjust warnings, errors and tests.

* Replace any() with set operation.

* Update spacy/tests/pipeline/test_pipe_methods.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Remve reference to config entry nlp.enabled from docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-08 14:58:10 +01:00
Adriane Boyd
420b1d854b
Update textcat scorer threshold behavior (#11696)
* Update textcat scorer threshold behavior

For `textcat` (with exclusive classes) the scorer should always use a
threshold of 0.0 because there should be one predicted label per doc and
the numeric score for that particular label should not matter.

* Rename to test_textcat_multilabel_threshold

* Remove all uses of threshold for multi_label=False

* Update Scorer.score_cats API docs

* Add tests for score_cats with thresholds

* Update textcat API docs

* Fix types

* Convert threshold back to float

* Fix threshold type in docstring

* Improve formatting in Scorer API docs
2022-11-02 15:35:04 +01:00
Aaron Zipp
d25f09468c
Spelling mistake in rule-based-matching.md (#11717)
Changed retokenize to retokenizer
2022-10-31 13:27:12 +09:00
Paul O'Leary McCann
6b78135b9e
Add warning to install widget for M1 GPUs (#11666)
* Add warning to install widget for M1 GPUs

* Use Thinc tracking issue instead

* Update website/src/widgets/quickstart-install.js

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Underline URL in warning

* Update website/src/widgets/quickstart-install.js

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Don't install cupy on m1 gpus

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-10-27 15:08:24 +02:00
Adriane Boyd
8740e4341f
Update languages and version in README and website (#11694) 2022-10-25 14:54:54 +02:00
Adriane Boyd
6c380d4fc6 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5 2022-10-20 13:45:17 +02:00
Adriane Boyd
7e56701057 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5 2022-10-20 13:38:49 +02:00
Cellan Hall
b69d249a22
Adding spacy-cleaner to the spaCy universe (#11674)
* added spacy-cleaner to the spaCy universe

* Move data to righ section of universe.json

* Cleanup

- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-10-20 20:38:29 +09:00
Paul O'Leary McCann
bf83f6872a
Add detailed example of env dict usage (#11677)
* Add detailed example of env dict usage

* Mark code blocks as yaml
2022-10-20 20:35:03 +09:00
Paul O'Leary McCann
858565a567
Fix issues with DVC commands (#11592)
* Fix flag handling in dvc

Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:

    spacy project run preprocess --verbose

That would fail with an error that there's no such directory as
`--verbose`.

This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.

A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.

* Add dvc quiet flag to docs

* Handle case in DVC where no commands are appropriate

If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.

* Add support for quiet flag

* Fix command execution

Commands are strings now because they're joined further up.
2022-10-18 15:11:39 +09:00
Paul O'Leary McCann
2e52479eec
Fix example code for spacy-wordnet (#11593)
* Fix example code for spacy-wordnet

It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Cleanup

* More cleanup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-10-11 16:45:05 +02:00
Sofie Van Landeghem
b187076a2d
fix docs (#11573) 2022-10-03 17:01:04 +02:00
svlandeg
9c8cdb403e Merge branch 'master_copy' into develop_copy 2022-09-30 15:40:26 +02:00
Gabriele Picco
ff9002b726
Add Zshot Spacy plugin (#11557)
* Add Zshot Spacy plugin

Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-29 17:34:44 +02:00
Paul O'Leary McCann
ba63f57f81
Update docs to reflect Doc input to Language (#11555) 2022-09-29 18:50:29 +09:00
Taniguchi Yasufumi
9557b0fb01
Add spacy-partial-tagger to spaCy Universe (#11538) 2022-09-27 14:11:50 +02:00
Paul O'Leary McCann
a44b7d4622
Add experimental coref docs (#11291)
* Add experimental coref docs

* Docs cleanup

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply changes from code review

* Fix prettier formatting

It seems a period after a number made this think it was a list?

* Update docs on examples for initialize

* Add docs for coref scorers

* Remove 3.4 notes from coref

There won't be a "new" tag until it's in core.

* Add docs for span cleaner

* Fix docs

* Fix docs to match spacy-experimental

These weren't properly updated when the code was moved out of spacy
core.

* More doc fixes

* Formatting

* Update architectures

* Fix links

* Fix another link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2022-09-27 18:11:23 +09:00