Raphael Mitsch
eef3d950b4
Fix SpanGroup
and Span
typing ( #12009 )
...
* Correct Span.label, Span.kb_id types. Fix SpanGroup.__iter__().
* Extend test.
* Rename test. Fix typo.
* Add comment.
* Fix types for Span.label, Span.kb_id, Span.char_span().
* Update spacy/tests/doc/test_span_group.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update docs.
* Fix typo.
* Update spacy/tokens/span_group.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-12-21 18:54:27 +01:00
Lj Miranda
8c4eee28bc
Better approach for handling zero suggestions
2022-12-21 20:01:02 +08:00
Lj Miranda
a3fad0b983
Handle zero suggestions to make tests pass
...
I'm not sure if this is the most elegant solution. But what should
happen is that the _make_span_group function MUST return an empty
SpanGroup if there are no suggestions.
The error happens when the 'scores' variable is empty. We cannot
get the 'predicted' and other downstream vars.
2022-12-21 10:36:01 +08:00
kadarakos
c223cd7a86
Add apply CLI ( #11376 )
...
* annotate cli first try
* add batch-size and n_process
* rename to apply
* typing fix
* handle file suffixes
* walk directories
* support jsonl
* typing fix
* remove debug
* make suffix optional for walk
* revert unrelated
* don't warn but raise
* better error message
* minor touch up
* Update spacy/tests/test_cli.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/cli/apply.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/cli/apply.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* update tests and bugfix
* add force_overwrite
* typo
* fix adding .spacy suffix
* Update spacy/cli/apply.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/cli/apply.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/cli/apply.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* store user data and rename cmd arg
* include test for user attr
* rename cmd arg
* better help message
* documentation
* prettier
* black
* link fix
* Update spacy/cli/apply.py
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update website/docs/api/cli.md
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update website/docs/api/cli.md
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* Update website/docs/api/cli.md
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
* addressing reviews
* dont quit but warn
* prettier
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2022-12-20 17:11:33 +01:00
Adriane Boyd
f476317387
Require thinc>=8.1.6 for serializable Softmax defaults
2022-12-20 14:41:40 +01:00
Jos Polfliet
18ffe5bbd6
Update stop_words.py ( #11997 )
...
fix typo in "aangaande"
2022-12-19 16:17:49 +01:00
cfuerbachersparks
3a2b655a29
Update lexeme.md ( #11994 )
...
Change suffix_ string to end
2022-12-19 10:33:38 +01:00
Adriane Boyd
c9d9d6847f
Update build constraints for python 3.11 ( #11981 )
2022-12-15 10:55:01 +01:00
Adriane Boyd
e5c7f3b077
CI: Install thinc-apple-ops through extra ( #11963 )
2022-12-12 10:13:10 +01:00
Lj Miranda
0336618eff
Merge branch 'master' into add/exclusive-spancat
2022-12-12 16:26:48 +08:00
Adriane Boyd
0591e67265
Cast to uint64 for all array-based doc representations ( #11933 )
...
* Convert all individual values explicitly to uint64 for array-based doc representations
* Temporarily test with latest numpy v1.24.0rc
* Remove unnecessary conversion from attr_t
* Reduce number of individual casts
* Convert specifically from int32 to uint64
* Revert "Temporarily test with latest numpy v1.24.0rc"
This reverts commit eb0e3c5006
.
* Also use int32 in tests
2022-12-12 08:45:35 +01:00
Adriane Boyd
8c291ace0c
Extend to wasabi v1.1 ( #11945 )
...
* Extend to wasabi v1.1
* Temporarily run mypy and tests with newest wasabi
* Temporarily skip check requirements test
* Revert "Temporarily skip check requirements test"
This reverts commit 44f4ce20a8
.
* Revert "Temporarily run mypy and tests with newest wasabi"
This reverts commit e677a2257c
.
2022-12-12 08:38:36 +01:00
github-actions[bot]
f22fc7a113
Auto-format code with black ( #11955 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-09 10:15:52 +01:00
vincent d warmerdam
6d2ca1ab3a
Update custom solutions links ( #11903 )
...
* Update custom solutions
Will now point to https://explosion.ai/custom-solutions
* added-sidebar
* added-analysis-to-readme
* update-landing-page
2022-12-07 16:02:09 +01:00
Paul O'Leary McCann
73919336fb
Remove spacy-sentence-segmenter from Universe ( #11932 )
2022-12-07 15:56:03 +01:00
Paul O'Leary McCann
5c3a60e8f4
Add in errors used in the beam code that were removed at some point ( #11935 )
...
I don't think there's any way to use the beam code at the moment, but as
long as it's around the errors it refers to should also be present.
2022-12-07 15:52:35 +01:00
Paul O'Leary McCann
916191848a
Update scattertext example code ( #11937 )
...
* Update scattertext example code
* Remove PMI Filter Threshold
2022-12-07 18:09:04 +09:00
Daniël de Kok
27fac7df2e
EditTreeLemmatizer: correctly add strings when initializing from labels ( #11934 )
...
Strings in replacement nodes where not added to the `StringStore`
when `EditTreeLemmatizer` was initialized from a set of labels. The
corresponding test did not capture this because it added the strings
through the examples that were passed to the initialization.
This change fixes both this bug in the initialization as the 'shadowing'
of the bug in the test.
2022-12-07 13:53:41 +09:00
Zhangrp
23085ffef4
Fix interpolation in directory names, see #11235 . ( #11914 )
2022-12-06 17:42:12 +09:00
Ryn Daniels
1aadcfcb37
update lock-threads to v4 ( #11930 )
2022-12-05 10:17:10 +01:00
Adriane Boyd
8afa8b5a7b
Refactor kwargs in CLI msg for future wasabi compatibility ( #11918 )
...
Necessary for mypy with wasabi v1+.
2022-12-05 10:00:00 +01:00
Darigov Research
6f342bdd72
docs: Adds link to license in readme ( #11924 )
...
Would resolve https://github.com/explosion/spaCy/issues/11923 if merged
2022-12-05 09:49:04 +01:00
Paul O'Leary McCann
5848656b5e
Switch ubuntu-latest to ubuntu-20.04 in main tests ( #11928 )
...
* Switch ubuntu-latest to ubuntu-20.04 in main tests
* Only use 20.04 for 3.6
2022-12-05 09:43:23 +01:00
Sofie Van Landeghem
4b2097a271
fix links ( #11927 )
2022-12-05 16:29:13 +09:00
Lj Miranda
9e88108298
Remove init_W and init_B parameters
...
This commit is expected to fail until the new Thinc release.
2022-12-05 08:13:59 +08:00
github-actions[bot]
df0cb4b77b
Auto-format code with black ( #11913 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2022-12-02 14:49:12 +01:00
Paul O'Leary McCann
f9d17a644b
Config generation fails for GPU without transformers ( #11899 )
...
If you don't have spacy-transformers installed, but try to use `init
config` with the GPU flag, you'll get an error. The issue is that the
`use_transformers` flag in the config is conflated with the GPU flag,
and then there's an attempt to access transformers config info that may
not exist.
There may be a better way to do this, but this stops the error.
2022-12-02 10:17:11 +01:00
Adriane Boyd
445c670a2d
Fix spancat for zero suggestions ( #11860 )
...
* Add test for spancat predict with zero suggestions
* Fix spancat for zero suggestions
* Undo changes to extract_spans
* Use .sum() as in update
2022-12-02 09:33:52 +01:00
Zhangrp
9cf3fa9711
Add docs for biluo_to_iob and iob_to_biluo. ( #11901 )
...
* Add docs for biluo_to_iob and iob_to_biluo.
* Fix typos.
* Remove redundant links.
2022-12-01 13:30:27 +01:00
Damian Romero
afd7a2476d
Fix typo in vocab.md table ( #11908 )
...
* Fix typo in vocab.md table
Fixes explosion/spaCy/#11907
* Reformat vocab.md with Prettier
2022-12-01 13:06:28 +01:00
Lj Miranda
6a10d56caf
Update spancat_exclusive docstring
2022-11-30 15:43:49 +08:00
Adriane Boyd
6f9d630f7e
Replace Pipe type with Callable in Language ( #11803 )
...
* Replace Pipe type with Callable in Language
* Use Callable[[Doc], Doc] in the docstrings
2022-11-29 13:20:08 +01:00
Paul O'Leary McCann
f1e0243450
Remove macro auc per type from textcat defaults ( #11887 )
...
This appears to have been added by mistake and never used. Removing it
does not break validation.
2022-11-29 11:50:23 +01:00
Adriane Boyd
e0d43557b7
Merge pull request #11871 from adrianeboyd/chore/v3.5.0
...
Prepare for v3.5.0
2022-11-29 11:41:32 +01:00
Adriane Boyd
1ebe7db07c
Support local filesystem remotes for projects ( #11762 )
...
* Support local filesystem remotes for projects
* Fix support for local filesystem remotes for projects
* Use `FluidPath` instead of `Pathy` to support both filesystem and
remote paths
* Create missing parent directories if required for local filesystem
* Add a more general `_file_exists` method to support both `Pathy`,
`Path`, and `smart_open`-compatible URLs
* Add explicit `smart_open` dependency starting with support for
`compression` flag
* Update `pathy` dependency to exclude older versions that aren't
compatible with required `smart_open` version
* Update docs to refer to `Pathy` instead of `smart_open` for project
remotes (technically you can still push to any `smart_open`-compatible
path but you can't pull from them)
* Add tests for local filesystem remotes
* Update pathy for general BlobStat sorting
* Add import
* Remove _file_exists since only Pathy remotes are supported
* Format CLI docs
* Clean up merge
2022-11-29 11:40:58 +01:00
Lj Miranda
14bf26d3e6
Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into add/exclusive-spancat
2022-11-29 11:37:16 +08:00
Lj Miranda
a1be07e2da
Put back initializers in spancat config
...
Whenever I remove model.scorer.init_w and model.scorer.init_b,
I encounter an error in the test:
SystemError: <method '__getitem__' of 'dict' objects> returned a result
with an error set.
My Thinc version is 8.1.5, but I can't seem to check what's causing the
error.
2022-11-29 11:32:38 +08:00
Lj Miranda
8138e49764
Update defaults for number of rows
...
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-29 11:26:04 +08:00
Lj Miranda
4ab27d4517
Use a single variable for tests
2022-11-29 11:25:35 +08:00
Lj Miranda
ac0ac3eb99
Fix documentation API
2022-11-29 11:19:10 +08:00
Lj Miranda
616723e902
Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into add/exclusive-spancat
2022-11-29 11:15:15 +08:00
Lj Miranda
0b32a949f1
Remove mypy ignore and typecast labels to list
2022-11-29 11:14:43 +08:00
Lj Miranda
14ae4a52c0
Clarify docstring for Exclusive_SpanCategorizer
2022-11-29 11:11:26 +08:00
Lj Miranda
29f156aa1a
Update documentation
...
Update grammar and usage
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2022-11-29 11:06:35 +08:00
Lj Miranda
bd0562e609
Use DEFAULT_EXCL_SPANCAT_MODEL
...
I also renamed spancat_exclusive_default_config into
spancat_excl_default_config because black does some not pretty
formatting changes.
2022-11-29 11:01:18 +08:00
Lj Miranda
d090ed404e
Remove initializers in default config
2022-11-29 10:56:53 +08:00
Lj Miranda
270db33dcf
Turn on formatting for allow_extra_label
2022-11-29 10:56:11 +08:00
Sofie Van Landeghem
96c9cf3448
Merge pull request #11855 from essenmitsosse/move-styleguide-out-of-readme
...
Move Styleguide out of Readme
2022-11-28 21:22:56 +01:00
Paul O'Leary McCann
f54bfb56c9
Don't throw an error if using displacy on an unset span key ( #11845 )
...
* Don't throw an error if using displacy on an unset span key
* List available keys in W117
2022-11-28 10:01:09 +01:00
Zhangrp
9f986af120
Add example sentence for Chinese in website meta ( #11879 )
2022-11-28 14:50:30 +09:00