Will Frey
8d4129e177
Fix invalid ConsoleLogger.v3 example config ( #12498 )
...
Replace `progress_bar = "all_steps"` with `progress_bar = "eval"`, which is consistent with the default behavior for `spacy.ConsoleLogger.v1` and `spacy.ConsoleLogger.v2`.
2023-04-04 20:53:07 +02:00
Edward
de32011e4c
Add model-last saving mechanism to pretraining ( #12459 )
...
* Adjust pretrain command
* chane naming and add finally block
* Add unit test
* Add unit test assertions
* Update spacy/training/pretrain.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* change finally block
* Add to docs
* Update website/docs/usage/embeddings-transformers.mdx
* Add flag to skip saving model-last
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-04-03 15:24:03 +02:00
Adriane Boyd
4a1ec332de
Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set ( #12493 )
...
* Add Span.kb_id/Span.id strings to Doc/DocBin serialization if set
* Format
2023-04-03 15:11:12 +02:00
Adriane Boyd
4538ceb507
Remove redundant strings.add for Doc.char_span ( #12429 )
2023-04-03 11:38:56 +02:00
Adriane Boyd
476a2e7a0a
Allow cupy 12.0 for extras ( #12490 )
2023-03-31 13:48:15 +02:00
Adriane Boyd
69e20ce03d
Fix pickle for ngram suggester ( #12486 )
2023-03-31 13:43:51 +02:00
Adriane Boyd
140d53649d
Convert values to numpy for label smoothing tests ( #12472 )
2023-03-31 13:41:41 +02:00
Ye Lei (叶磊)
ce258670b7
Allow passing a Span to displacy.parse_deps ( #12477 )
...
* Allow passing a Span to displacy.parse_deps
* Update docstring
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update API docs
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-31 09:44:01 +02:00
Daniël de Kok
b734e5314d
Avoid TrainablePipe.finish_update
getting called twice during training ( #12450 )
...
* Avoid `TrainablePipe.finish_update` getting called twice during training
PR #12136 fixed an issue where the tok2vec pipe was updated before
gradient were accumulated. However, it introduced a new bug that cause
`finish_update` to be called twice when using the training loop. This
causes a fairly large slowdown.
The `Language.update` method accepts the `sgd` argument for passing an
optimizer. This argument has three possible values:
- `Optimizer`: use the given optimizer to finish pipe updates.
- `None`: use a default optimizer to finish pipe updates.
- `False`: do not finish pipe updates.
However, the latter option was not documented and not valid with the
existing type of `sgd`. I assumed that this was a remnant of earlier
spaCy versions and removed handling of `False`.
However, with that change, we are passing `None` to `Language.update`.
As a result, we were calling `finish_update` in both `Language.update`
and in the training loop after all subbatches are processed.
This change restores proper handling/use of `False`. Moreover, the role
of `False` is now documented and added to the type to avoid future
accidents.
* Fix typo
* Document defaults for `Language.update`
2023-03-30 09:30:42 +02:00
Raphael Mitsch
d85df9d577
Fix Span.sents for edge case of Span being the only Span in the last sentence of a Doc. ( #12484 )
2023-03-29 18:54:47 +02:00
kadarakos
372a90885e
Fix spancat-singlelabel score ( #12469 )
...
* debug argmax sort and add span scores
* add missing tests for spanscores
2023-03-29 08:38:11 +02:00
Edward
dba4e7bece
Add info to stringstore and vocab ( #12471 )
2023-03-27 13:15:14 +02:00
Adriane Boyd
2fba21be63
Restrict github workflows to explosion ( #12470 )
2023-03-27 12:44:04 +02:00
sloev / Johannes Valbjørn
fd072533e7
add spacy_onnx_sentiment_english to universe ( #12422 )
...
* add spacy_onnx_sentiment_english to universe
* rename to sentimental-onix
* fix comma json error
* fix typo
* typo fix
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* mention need to download model before example works
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-27 11:35:14 +02:00
Prajakta Darade
ae7779e830
corrected example code ( #12466 )
2023-03-27 11:32:49 +02:00
kadarakos
d1474fdd91
add explanation about overwriting behaviour ( #12464 )
...
* add explanation about overwriting behaviour
* Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* format
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-27 10:27:11 +02:00
Edward
a653dec654
Add info that Vocab and StringStore are not static in docs ( #12427 )
...
* Add size increase info about vocab and stringstore
* Update website/docs/api/stringstore.mdx
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* Update website/docs/api/vocab.mdx
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* Change wording
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-03-27 09:18:23 +02:00
Adriane Boyd
fac457a509
Support floret for PretrainVectors ( #12435 )
...
* Support floret for PretrainVectors
* Format
2023-03-24 16:28:51 +01:00
Adriane Boyd
d0bd3f5ee4
Update Serbian tokenization for UD Serbian SET ( #12442 )
2023-03-24 16:26:40 +01:00
Vinit Ravishankar
28de85737f
Tagger label smoothing ( #12293 )
...
* add label smoothing
* use True/False instead of floats
* add entropy to debug data
* formatting
* docs
* change test to check difference in distributions
* Update website/docs/api/tagger.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* Update spacy/pipeline/tagger.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* bool -> float
* update docs
* fix seed
* black
* update tests to use label_smoothing = 0.0
* set default to 0.0, update quickstart
* Update spacy/pipeline/tagger.pyx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
* update morphologizer, tagger test
* fix morph docs
* add url to docs
---------
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-22 12:17:56 +01:00
Ines Montani
b479f8bfa5
Add user survey alert to the top ( #12452 )
...
* Add user survey alert to the top
* Shorter
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-22 11:09:37 +01:00
Raphael Mitsch
3102e2e27a
Entity linking: use SpanGroup
instead of Iterable[Span]
for mentions ( #12344 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Reverse erroneous changes during merge.
* Update return type in EL tests.
* Re-add Candidate to setup.py.
* Format updated docs.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 12:25:18 +01:00
Raphael Mitsch
e5be5d6092
Merge branch 'v4' into feature/docwise-generator-batching
...
# Conflicts:
# spacy/kb/kb.pyx
# spacy/kb/kb_in_memory.pyx
# spacy/ml/models/entity_linker.py
# spacy/pipeline/entity_linker.py
# spacy/tests/pipeline/test_entity_linker.py
# website/docs/api/inmemorylookupkb.mdx
# website/docs/api/kb.mdx
2023-03-20 10:50:54 +01:00
Raphael Mitsch
cb79af3a10
Fix merge leftovers.
2023-03-20 10:31:11 +01:00
Raphael Mitsch
73bdeb01e4
Merge branch 'refactor/el-candidates' into feature/docwise-generator-batching
...
# Conflicts:
# spacy/kb/candidate.py
# spacy/kb/kb.pyx
# spacy/kb/kb_in_memory.pyx
# spacy/ml/models/entity_linker.py
# spacy/pipeline/entity_linker.py
# spacy/tests/pipeline/test_entity_linker.py
# website/docs/api/inmemorylookupkb.mdx
# website/docs/api/kb.mdx
2023-03-20 10:24:17 +01:00
Raphael Mitsch
9340eb8ad2
Introduce hierarchy for EL Candidate
objects ( #12341 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 00:34:35 +01:00
Adriane Boyd
6ae7618418
Clean up Vocab constructor ( #12290 )
...
* Clean up Vocab constructor
* Change effective type of `strings` from `Iterable[str]` to `Optional[StringStore]`
* Don't automatically add strings to vocab
* Change default values to `None`
* Remove `**deprecated_kwargs`
* Format
2023-03-19 23:41:20 +01:00
Sofie Van Landeghem
b83407388a
fix import
2023-03-19 23:34:00 +01:00
Sofie Van Landeghem
0365d3d2e2
fix docs
2023-03-19 23:31:02 +01:00
Sofie Van Landeghem
9e71adc074
Use Candidate output type for toy generators in the test suite to mimick best practices
2023-03-19 23:27:20 +01:00
Raphael Mitsch
faede7155c
Update spacy/kb/kb.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 11:32:41 +01:00
Raphael Mitsch
4d8dce5ba2
Update spacy/errors.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 11:28:18 +01:00
Adriane Boyd
54c614e116
CI: Separate spacy universe validation into a separate workflow ( #12440 )
...
* Separate spacy universe validation into a separate workflow
* Fix new workflow name
2023-03-17 10:59:53 +01:00
Adriane Boyd
5f72d6c836
CI: Switch PR back to paths-ignore ( #12438 )
...
Switch PR tests back to paths-ignore but include changes to `.github`
for all PRs rather than trying to figure out complicated
includes+excludes. Changes to `.github` are relatively rare and should
not be a huge burden for the CI.
2023-03-17 10:01:49 +01:00
Adriane Boyd
4c5a3a2a7b
Remove autoblack workflow ( #12437 )
...
Now that all PRs have `black` formatting validation, we no longer need the
autoblack workflow.
2023-03-17 09:35:00 +01:00
Raphael Mitsch
2377b67f81
Update W401 test.
2023-03-17 08:59:52 +01:00
Raphael Mitsch
307bbab285
Update spacy/ml/models/entity_linker.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 08:58:28 +01:00
Raphael Mitsch
978fbdcee1
Update spacy/kb/kb.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 08:58:17 +01:00
Raphael Mitsch
830939ee64
Fix InMemoryCandidate attribute names.
2023-03-15 10:51:34 +01:00
Raphael Mitsch
80fb0666b9
Update docstrings.
2023-03-15 09:25:41 +01:00
Raphael Mitsch
3cfc1c6acc
Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
2023-03-15 09:23:31 +01:00
Raphael Mitsch
961795d9f1
Update spacy/ml/models/entity_linker.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-15 09:20:25 +01:00
Raphael Mitsch
b7b4282821
Update spacy/ml/models/entity_linker.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-15 09:20:07 +01:00
Raphael Mitsch
96b61d0671
Fix EL failure with sentence-crossing entities ( #12398 )
...
* Add test reproducing EL failure in sentence-crossing entities.
* Format.
* Draft fix.
* Format.
* Fix case for len(ent.sents) == 1.
* Format.
* Format.
* Format.
* Fix mypy error.
* Merge EL sentence crossing tests.
* Remove unneeded sentencizer component.
* Fix or ignore mypy issues in test.
* Simplify ent.sents handling.
* Format. Update assert in ent.sents handling.
* Small rewrite
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-14 22:02:49 +01:00
Adriane Boyd
2ce9a220db
Fix --verbose for spacy find-threshold ( #12418 )
2023-03-14 17:16:49 +01:00
Adriane Boyd
377f601bff
CI: Add all paths before excluding patterns ( #12419 )
2023-03-14 16:06:08 +01:00
Raphael Mitsch
28dbed64cb
Update alias/mention usage in doc(strings).
2023-03-14 13:33:05 +01:00
Raphael Mitsch
e8cab4625c
Fix sentence indexing bug in Span.sents
( #12405 )
...
* Add test for partial sentences in ent.sents.
* Removed unneeded import.
* Format. Simplify code.
2023-03-14 10:21:53 +01:00
Raphael Mitsch
be858981e6
Update docstrings related to prior_prob.
2023-03-13 17:01:20 +01:00
Raphael Mitsch
4a921766f1
Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
2023-03-13 16:54:38 +01:00