Commit Graph

3004 Commits

Author SHA1 Message Date
svlandeg
799d226676 prettier formatting 2022-12-05 08:57:24 +01:00
svlandeg
04fea09ffd Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00
Sofie Van Landeghem
4b2097a271
fix links (#11927) 2022-12-05 16:29:13 +09:00
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. (#11901)
* Add docs for biluo_to_iob and iob_to_biluo.

* Fix typos.

* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table (#11908)
* Fix typo in vocab.md table

Fixes explosion/spaCy/#11907

* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00
Sofie Van Landeghem
96c9cf3448
Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme
Move Styleguide out of Readme
2022-11-28 21:22:56 +01:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta (#11879) 2022-11-28 14:50:30 +09:00
Marcus Blättermann
5c9faf6eea
Update menu for styleguide
This reflects the removed parts from ecbf052abd
2022-11-27 03:48:05 +01:00
Marcus Blättermann
90141202c0
Merge branch 'move-styleguide-out-of-readme' into migrate-to-next-web-17 2022-11-27 03:48:03 +01:00
Marcus Blättermann
7f2ea20fee
Update README.md 2022-11-27 03:47:11 +01:00
Marcus Blättermann
c23d54fd26
Remove MDX tags from README.md 2022-11-27 03:47:11 +01:00
Raphael Mitsch
c0fd8a2e71
find-threshold: CLI command for multi-label classifier threshold tuning (#11280)
* Add foundation for find-threshold CLI functionality.

* Finish first draft for find-threshold.

* Add tests.

* Revert adjusted import statements.

* Fix mypy errors.

* Fix imports.

* Harmonize arguments with spacy evaluate command.

* Generalize component and threshold handling. Harmonize arguments with 'spacy evaluate' CLI.

* Fix Spancat test.

* Add beta parameter to Scorer and PRFScore.

* Make beta a component scorer setting.

* Remove beta.

* Update nlp.config (workaround).

* Reload pipeline on threshold change. Adjust tests. Remove confection reference.

* Remove assumption of component being a Pipe object or having a .cfg attribute.

* Adjust test output and reference values.

* Remove beta references. Delete universe.json.

* Reverting unnecessary changes. Removing unused default values. Renaming variables in find-cli tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Remove adding labels in tests.

* Remove unused error

* Undo changes to PRFScorer

* Change default value for n_trials. Log table iteratively.

* Add warnings for pointless applications of find_threshold().

* Fix imports.

* Adjust type check of TextCategorizer to exclude subclasses.

* Change check of if there's only one unique value in scores.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Incorporate feedback.

* Fix test issue. Update docstring.

* Update docs & docstring.

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Add examples to docs. Rename _nlp to nlp in tests.

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/find_threshold.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-11-25 11:44:55 +01:00
kadarakos
dece775279
correct ndim in docs (#11869) 2022-11-25 11:31:28 +01:00
Madeesh Kannan
5ea14af32b
Add training.before_update callback (#11739)
* Add `training.before_update` callback

This callback can be used to implement training paradigms like gradual (un)freezing of components (e.g: the Transformer) after a certain number of training steps to mitigate catastrophic forgetting during fine-tuning.

* Fix type annotation, default config value

* Generalize arguments passed to the callback

* Update schema

* Pass `epoch` to callback, rename `current_step` to `step`

* Add test

* Simplify test

* Replace config string with `spacy.blank`

* Apply suggestions from code review

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Cleanup imports

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-23 17:54:58 +01:00
Edward
e79910d57e
Remove sentiment extension (#11722)
* remove sentiment attribute

* remove sentiment from docs

* add test for backwards compatibility

* replace from_disk with from_bytes

* Fix docs and format file

* Fix formatting
2022-11-23 13:09:32 +01:00
Paul O'Leary McCann
8271cfb4cd
Remove Learning Path spaCy (#11846) 2022-11-23 11:03:18 +01:00
Marcus Blättermann
ecbf052abd
Remove README.md content from styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
5659eeaadd
Remove styleguide content from README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
8c0ceca637
Move README.md content to styleguide 2022-11-23 02:04:54 +01:00
Marcus Blättermann
0794e5c6cc
Add missing files to project structure in README.md 2022-11-23 02:04:54 +01:00
Marcus Blättermann
96218a1e8f
Delete styleguide.md
This is in intermediate commit, so the content of `/README.md`can be moved to the styleguid, but the history is kept
2022-11-23 02:04:54 +01:00
Marcus Blättermann
9d96e44a87
Apply Prettier to README.md 2022-11-23 02:04:49 +01:00
Paul O'Leary McCann
e3173bd86d
Remove spikex from Universe (#11825) 2022-11-18 08:24:22 +01:00
Peter Baumgartner
9baa686f82
remove migration support form (#11802) 2022-11-14 16:53:14 +01:00
Paul O'Leary McCann
bb523d4d91
Remove spacy-ray from docs (#11781)
* Remove spacy ray from cli docs

* Remove more ray docs

* Remove ray from universe
2022-11-14 19:58:38 +09:00
Edward
3478ff1eb0
remove new v2 tags (#11780) 2022-11-14 17:41:01 +09:00
Jacobo Myerston
322b5dc1df
Add greCy to Universe (#11774)
* Update universe.json

* Update universe.json

fixes Github value
2022-11-10 13:21:20 +09:00
Raphael Mitsch
20bbbe3e44
Revert disable/disabled merging behavior (#11745)
* Merge disable with disabled. Adjust warnings, errors and tests.

* Replace any() with set operation.

* Update spacy/tests/pipeline/test_pipe_methods.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Remve reference to config entry nlp.enabled from docs.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-08 14:58:10 +01:00
Adriane Boyd
68b8fa2df2 Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master-4 2022-11-03 09:42:36 +01:00
Adriane Boyd
420b1d854b
Update textcat scorer threshold behavior (#11696)
* Update textcat scorer threshold behavior

For `textcat` (with exclusive classes) the scorer should always use a
threshold of 0.0 because there should be one predicted label per doc and
the numeric score for that particular label should not matter.

* Rename to test_textcat_multilabel_threshold

* Remove all uses of threshold for multi_label=False

* Update Scorer.score_cats API docs

* Add tests for score_cats with thresholds

* Update textcat API docs

* Fix types

* Convert threshold back to float

* Fix threshold type in docstring

* Improve formatting in Scorer API docs
2022-11-02 15:35:04 +01:00
Aaron Zipp
d25f09468c
Spelling mistake in rule-based-matching.md (#11717)
Changed retokenize to retokenizer
2022-10-31 13:27:12 +09:00
Paul O'Leary McCann
6b78135b9e
Add warning to install widget for M1 GPUs (#11666)
* Add warning to install widget for M1 GPUs

* Use Thinc tracking issue instead

* Update website/src/widgets/quickstart-install.js

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Underline URL in warning

* Update website/src/widgets/quickstart-install.js

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Don't install cupy on m1 gpus

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-10-27 15:08:24 +02:00
Adriane Boyd
8740e4341f
Update languages and version in README and website (#11694) 2022-10-25 14:54:54 +02:00
Adriane Boyd
cae4589f5a
Replace EntityRuler with SpanRuler implementation (#11320)
* Replace EntityRuler with SpanRuler implementation

Remove `EntityRuler` and rename the `SpanRuler`-based
`future_entity_ruler` to `entity_ruler`.

Main changes:

* It is no longer possible to load patterns on init as with
`EntityRuler(patterns=)`.
* The older serialization formats (`patterns.jsonl`) are no longer
supported and the related tests are removed.
* The config settings are only stored in the config, not in the
serialized component (in particular the `phrase_matcher_attr` and
overwrite settings).

* Add migration guide to EntityRuler API docs

* docs update

* Minor edit

Co-authored-by: svlandeg <svlandeg@github.com>
2022-10-24 09:11:35 +02:00
Adriane Boyd
103b24fb25 Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master 2022-10-21 09:13:32 +02:00
Adriane Boyd
6c380d4fc6 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5 2022-10-20 13:45:17 +02:00
Adriane Boyd
7e56701057 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5 2022-10-20 13:38:49 +02:00
Cellan Hall
b69d249a22
Adding spacy-cleaner to the spaCy universe (#11674)
* added spacy-cleaner to the spaCy universe

* Move data to righ section of universe.json

* Cleanup

- fix typo ("replacers")
- spaCy doesn't need to be marked as code
- lemma of "Hello" is lower case

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-10-20 20:38:29 +09:00
Paul O'Leary McCann
bf83f6872a
Add detailed example of env dict usage (#11677)
* Add detailed example of env dict usage

* Mark code blocks as yaml
2022-10-20 20:35:03 +09:00
Paul O'Leary McCann
858565a567
Fix issues with DVC commands (#11592)
* Fix flag handling in dvc

Prior to this commit, if a flag (--verbose or --quiet) was passed to
DVC, it would be added to the end of the generated dvc command line.
This would result in the command being interpreted as part of the actual
command to run, rather than an argument to dvc. This would result in
command lines like:

    spacy project run preprocess --verbose

That would fail with an error that there's no such directory as
`--verbose`.

This change puts the flags at the front of the dvc command so that they
are interpreted correctly. It removes the `run_dvc_commands` function,
which had been reduced to just a for loop and wasn't used elsewhere.

A separate problem is that there's no way to specify the quiet behaviour
to dvc from the command line, though it's unclear if that's a bug.

* Add dvc quiet flag to docs

* Handle case in DVC where no commands are appropriate

If only have commands with no deps or outputs (admittedly unlikely), you
get a weird error about the dvc file not existing. This gives explicit
output instead.

* Add support for quiet flag

* Fix command execution

Commands are strings now because they're joined further up.
2022-10-18 15:11:39 +09:00
Paul O'Leary McCann
2e52479eec
Fix example code for spacy-wordnet (#11593)
* Fix example code for spacy-wordnet

It looks like in the most recent version, 0.1.0, it's no longer possible
to pass the lang parameter to the component separately. Doing so will
raise an error.

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Cleanup

* More cleanup

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-10-11 16:45:05 +02:00
Madeesh Kannan
446a3ecf34
StringStore refactoring (#11344)
* `strings`: Remove unused `hash32_utf8` function

* `strings`: Make `hash_utf8` and `decode_Utf8Str` private

* `strings`: Reorganize private functions

* 'strings': Raise error when non-string/-int types are passed to functions that don't accept them

* `strings`: Add `items()` method, add type hints, remove unused methods, restrict inputs to specific types, reorganize methods

* `Morphology`: Use `StringStore.items()` to enumerate features when pickling

* `test_stringstore`: Update pre-Python 3 tests

* Update `StringStore` docs

* Fix `get_string_id` imports

* Replace redundant test with tests for type checking

* Rename `_retrieve_interned_str`, remove `.get` default arg

* Add `get_string_id` to `strings.pyi`
Remove `mypy` ignore directives from imports of the above

* `strings.pyi`: Replace functions that consume `Union`-typed params with overloads

* `strings.pyi`: Revert some function signatures

* Update `SYMBOLS_BY_INT` lookups and error codes post-merge

* Revert clobbered change introduced in a previous merge

* Remove unnecessary type hint

* Invert tuple order in `StringStore.items()`

* Add test for `StringStore.items()`

* Revert "`Morphology`: Use `StringStore.items()` to enumerate features when pickling"

This reverts commit 1af9510ceb.

* Rename `keys` and `key_map`

* Add `keys()` and `values()`

* Add comment about the inverted key-value semantics in the API

* Fix type hints

* Implement `keys()`, `values()`, `items()` without generators

* Fix type hints, remove unnecessary boxing

* Update docs

* Simplify `keys/values/items()` impl

* `mypy` fix

* Fix error message, doc fixes
2022-10-06 10:51:06 +02:00
Sofie Van Landeghem
b187076a2d
fix docs (#11573) 2022-10-03 17:01:04 +02:00
svlandeg
e3027c65b8 Merge branch 'copy_develop' into copy_v4 2022-10-03 14:12:16 +02:00
svlandeg
9c8cdb403e Merge branch 'master_copy' into develop_copy 2022-09-30 15:40:26 +02:00
Gabriele Picco
ff9002b726
Add Zshot Spacy plugin (#11557)
* Add Zshot Spacy plugin

Add Zshot (Zero and Few shot named entity & relationships recognition) Spacy plugin

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/meta/universe.json

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-09-29 17:34:44 +02:00
Paul O'Leary McCann
ba63f57f81
Update docs to reflect Doc input to Language (#11555) 2022-09-29 18:50:29 +09:00
Taniguchi Yasufumi
9557b0fb01
Add spacy-partial-tagger to spaCy Universe (#11538) 2022-09-27 14:11:50 +02:00
Paul O'Leary McCann
a44b7d4622
Add experimental coref docs (#11291)
* Add experimental coref docs

* Docs cleanup

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Apply changes from code review

* Fix prettier formatting

It seems a period after a number made this think it was a list?

* Update docs on examples for initialize

* Add docs for coref scorers

* Remove 3.4 notes from coref

There won't be a "new" tag until it's in core.

* Add docs for span cleaner

* Fix docs

* Fix docs to match spacy-experimental

These weren't properly updated when the code was moved out of spacy
core.

* More doc fixes

* Formatting

* Update architectures

* Fix links

* Fix another link

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <svlandeg@github.com>
2022-09-27 18:11:23 +09:00