Commit Graph

15811 Commits

Author SHA1 Message Date
svlandeg
6852adc8b7 Merge branch 'copy_master' into copy_v4 2023-01-03 13:34:05 +01:00
Wannaphong Phatthiyaphaibun
31c1beba78
Add spacy-pythainlp (#12038)
* Add spacy-pythainlp

* Move submission to right section

* Minor cleanup

* Remove extra list call

* Update universe.json

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2023-01-03 17:03:59 +09:00
github-actions[bot]
abb0ab109d
Auto-format code with black (#12035)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2023-01-02 11:59:57 +01:00
Adriane Boyd
ef9e504eac
Rename modified textcat scorer to v2 (#11971)
As a follow-up to #11696, rename the modified scorer to v2 and move the
v1 scorer to `spacy-legacy`.
2022-12-29 14:01:08 +01:00
Daniël de Kok
20b63943f5
Adjust to new Schedule class and pass scores to Optimizer (#12008)
* Adjust to new `Schedule` class and pass scores to `Optimizer`

Requires https://github.com/explosion/thinc/pull/804

* Bump minimum Thinc requirement to 9.0.0.dev1
2022-12-29 08:03:24 +01:00
kadarakos
933b54ac79
typo fix (#11995) 2022-12-26 13:26:35 +01:00
Madeesh Kannan
aa2b471a6e
New console logger with expanded progress tracking (#11972)
* Add `ConsoleLogger.v3`

This addition expands the progress bar feature to count up the training/distillation steps to either the next evaluation pass or the maximum number of steps.

* Rename progress bar types

* Add defaults to docs
Minor fixes

* Move comment

* Minor punctuation fixes

* Explicitly check for `None` when validating progress bar type

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-23 15:21:44 +01:00
github-actions[bot]
90896504a5
Auto-format code with black (#12019)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-23 12:44:07 +01:00
Daniël de Kok
d30ba9b7b8
Merge pull request #12015 from danieldk/chore/v4-merge-master-20221222
Merge master into v4
2022-12-22 11:22:33 +01:00
Adriane Boyd
64d2d27c5d
Add classifier for python 3.11 (#12013) 2022-12-22 10:53:16 +01:00
Daniël de Kok
2f08deea2a Fix fallout from a previous merge 2022-12-22 10:23:31 +01:00
Daniël de Kok
207565a788 Merge remote-tracking branch 'upstream/master' into chore/v4-merge-master-20221222 2022-12-22 10:08:54 +01:00
Raphael Mitsch
eef3d950b4
Fix SpanGroup and Span typing (#12009)
* Correct Span.label, Span.kb_id types. Fix SpanGroup.__iter__().

* Extend test.

* Rename test. Fix typo.

* Add comment.

* Fix types for Span.label, Span.kb_id, Span.char_span().

* Update spacy/tests/doc/test_span_group.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update docs.

* Fix typo.

* Update spacy/tokens/span_group.pyx

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-12-21 18:54:27 +01:00
kadarakos
c223cd7a86
Add apply CLI (#11376)
* annotate cli first try

* add batch-size and n_process

* rename to apply

* typing fix

* handle file suffixes

* walk directories

* support jsonl

* typing fix

* remove debug

* make suffix optional for walk

* revert unrelated

* don't warn but raise

* better error message

* minor touch up

* Update spacy/tests/test_cli.py

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* update tests and bugfix

* add force_overwrite

* typo

* fix adding .spacy suffix

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/cli/apply.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* store user data and rename cmd arg

* include test for user attr

* rename cmd arg

* better help message

* documentation

* prettier

* black

* link fix

* Update spacy/cli/apply.py

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* Update website/docs/api/cli.md

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>

* addressing reviews

* dont quit but warn

* prettier

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-20 17:11:33 +01:00
Jos Polfliet
18ffe5bbd6
Update stop_words.py (#11997)
fix typo in "aangaande"
2022-12-19 16:17:49 +01:00
cfuerbachersparks
3a2b655a29
Update lexeme.md (#11994)
Change suffix_ string to end
2022-12-19 10:33:38 +01:00
Daniël de Kok
f9308aae13
Fix v4 branch to build against Thinc v9 (#11921)
* Move `thinc.extra.search` to `spacy.pipeline._parser_internals`

Backport of:
https://github.com/explosion/spaCy/pull/11317

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Replace references to `thinc.backends.linalg` with `CBlas`

Backport of:
https://github.com/explosion/spaCy/pull/11292

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* Use cross entropy from `thinc.legacy`

* Require thinc>=9.0.0.dev0,<9.1.0

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2022-12-17 14:32:19 +01:00
Adriane Boyd
c9d9d6847f
Update build constraints for python 3.11 (#11981) 2022-12-15 10:55:01 +01:00
Adriane Boyd
e5c7f3b077
CI: Install thinc-apple-ops through extra (#11963) 2022-12-12 10:13:10 +01:00
Edward
ca75190a3d
Custom extensions for spans with equal boundaries (#11429)
* Init

* Fix return type for mypy

* adjust types and improve setting new attributes

* Add underscore changes to json conversion

* Add test and underscore changes to from_docs

* add underscore changes and test to span.to_doc

* update return values

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Add types to function

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* adjust formatting

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* shorten return type

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* add helper function to improve readability

* Improve code and add comments

* rerun azure tests

* Fix tests for json conversion

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2022-12-12 08:55:53 +01:00
Adriane Boyd
0591e67265
Cast to uint64 for all array-based doc representations (#11933)
* Convert all individual values explicitly to uint64 for array-based doc representations

* Temporarily test with latest numpy v1.24.0rc

* Remove unnecessary conversion from attr_t

* Reduce number of individual casts

* Convert specifically from int32 to uint64

* Revert "Temporarily test with latest numpy v1.24.0rc"

This reverts commit eb0e3c5006.

* Also use int32 in tests
2022-12-12 08:45:35 +01:00
Adriane Boyd
8c291ace0c
Extend to wasabi v1.1 (#11945)
* Extend to wasabi v1.1

* Temporarily run mypy and tests with newest wasabi

* Temporarily skip check requirements test

* Revert "Temporarily skip check requirements test"

This reverts commit 44f4ce20a8.

* Revert "Temporarily run mypy and tests with newest wasabi"

This reverts commit e677a2257c.
2022-12-12 08:38:36 +01:00
github-actions[bot]
f22fc7a113
Auto-format code with black (#11955)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-09 10:15:52 +01:00
Madeesh Kannan
f5aabaf7d6
Remove unused, experimental multi-task components (#11919)
* Remove experimental multi-task components

These are incomplete implementations and are not usable in their current state.

* Remove orphaned error message

* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)

* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6

* Revert "Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)"

This reverts commit 77c0fd7b17.

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-08 13:24:45 +01:00
Paul O'Leary McCann
d60997febb
Remove old model shortcuts (#11916)
* Remove old model shortcuts

* Remove error, docs warnings about shortcuts

* Fix import in util

Accidentally deleted the whole import and not just the old part...

* Change universe example to v3 style

* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)

* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6

* Update some model loading in Universe

* Add v2 tag to neuralcoref

* Use the spacy-version feature instead of a v2 tag

Co-authored-by: svlandeg <svlandeg@github.com>
2022-12-08 11:45:52 +01:00
Paul O'Leary McCann
6b9af38eeb
Remove all references to "begin_training" (#11943)
When v3 was released, `begin_training` was renamed to `initialize`.
There were warnings in the code and docs about that. This PR removes
them.
2022-12-08 11:43:52 +01:00
vincent d warmerdam
6d2ca1ab3a
Update custom solutions links (#11903)
* Update custom solutions

Will now point to https://explosion.ai/custom-solutions

* added-sidebar

* added-analysis-to-readme

* update-landing-page
2022-12-07 16:02:09 +01:00
Paul O'Leary McCann
73919336fb
Remove spacy-sentence-segmenter from Universe (#11932) 2022-12-07 15:56:03 +01:00
Paul O'Leary McCann
5c3a60e8f4
Add in errors used in the beam code that were removed at some point (#11935)
I don't think there's any way to use the beam code at the moment, but as
long as it's around the errors it refers to should also be present.
2022-12-07 15:52:35 +01:00
Sofie Van Landeghem
60379cec65
Merge pull request #11929 from svlandeg/copy_v4
sync v4 with latest master
2022-12-07 15:24:07 +01:00
Paul O'Leary McCann
916191848a
Update scattertext example code (#11937)
* Update scattertext example code

* Remove PMI Filter Threshold
2022-12-07 18:09:04 +09:00
Daniël de Kok
27fac7df2e
EditTreeLemmatizer: correctly add strings when initializing from labels (#11934)
Strings in replacement nodes where not added to the `StringStore`
when `EditTreeLemmatizer` was initialized from a set of labels. The
corresponding test did not capture this because it added the strings
through the examples that were passed to the initialization.

This change fixes both this bug in the initialization as the 'shadowing'
of the bug in the test.
2022-12-07 13:53:41 +09:00
Zhangrp
23085ffef4
Fix interpolation in directory names, see #11235. (#11914) 2022-12-06 17:42:12 +09:00
Ryn Daniels
1aadcfcb37
update lock-threads to v4 (#11930) 2022-12-05 10:17:10 +01:00
Adriane Boyd
8afa8b5a7b
Refactor kwargs in CLI msg for future wasabi compatibility (#11918)
Necessary for mypy with wasabi v1+.
2022-12-05 10:00:00 +01:00
Darigov Research
6f342bdd72
docs: Adds link to license in readme (#11924)
Would resolve https://github.com/explosion/spaCy/issues/11923 if merged
2022-12-05 09:49:04 +01:00
Paul O'Leary McCann
8267aa1b65 Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6
2022-12-05 09:44:19 +01:00
Paul O'Leary McCann
5848656b5e
Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)
* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6
2022-12-05 09:43:23 +01:00
svlandeg
799d226676 prettier formatting 2022-12-05 08:57:24 +01:00
svlandeg
04fea09ffd Merge branch 'copy_master' into copy_v4 2022-12-05 08:56:15 +01:00
Sofie Van Landeghem
4b2097a271
fix links (#11927) 2022-12-05 16:29:13 +09:00
github-actions[bot]
df0cb4b77b
Auto-format code with black (#11913)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-02 14:49:12 +01:00
Paul O'Leary McCann
f9d17a644b
Config generation fails for GPU without transformers (#11899)
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.

There may be a better way to do this, but this stops the error.
2022-12-02 10:17:11 +01:00
Adriane Boyd
445c670a2d
Fix spancat for zero suggestions (#11860)
* Add test for spancat predict with zero suggestions

* Fix spancat for zero suggestions

* Undo changes to extract_spans

* Use .sum() as in update
2022-12-02 09:33:52 +01:00
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. (#11901)
* Add docs for biluo_to_iob and iob_to_biluo.

* Fix typos.

* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table (#11908)
* Fix typo in vocab.md table

Fixes explosion/spaCy/#11907

* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Adriane Boyd
6f9d630f7e
Replace Pipe type with Callable in Language (#11803)
* Replace Pipe type with Callable in Language

* Use Callable[[Doc], Doc] in the docstrings
2022-11-29 13:20:08 +01:00
Paul O'Leary McCann
f1e0243450
Remove macro auc per type from textcat defaults (#11887)
This appears to have been added by mistake and never used. Removing it
does not break validation.
2022-11-29 11:50:23 +01:00
Adriane Boyd
e0d43557b7
Merge pull request #11871 from adrianeboyd/chore/v3.5.0
Prepare for v3.5.0
2022-11-29 11:41:32 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects (#11762)
* Support local filesystem remotes for projects

* Fix support for local filesystem remotes for projects
  * Use `FluidPath` instead of `Pathy` to support both filesystem and
    remote paths
  * Create missing parent directories if required for local filesystem
  * Add a more general `_file_exists` method to support both `Pathy`,
    `Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
  `compression` flag
* Update `pathy` dependency to exclude older versions that aren't
  compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
  remotes (technically you can still push to any `smart_open`-compatible
  path but you can't pull from them)
* Add tests for local filesystem remotes

* Update pathy for general BlobStat sorting

* Add import

* Remove _file_exists since only Pathy remotes are supported

* Format CLI docs

* Clean up merge
2022-11-29 11:40:58 +01:00