Raphael Mitsch
1ece9ec30b
Update website/docs/api/inmemorylookupkb.mdx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-24 20:44:57 +02:00
Raphael Mitsch
10ddefa686
Update spacy/kb/kb_in_memory.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-24 20:44:37 +02:00
Raphael Mitsch
fb79b52c73
Update website/docs/api/entitylinker.mdx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-24 20:44:18 +02:00
Raphael Mitsch
3ae31f7bd2
Update website/docs/api/kb.mdx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-24 20:44:08 +02:00
Raphael Mitsch
9b677adb7a
Update spacy/kb/kb.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-24 20:43:56 +02:00
Raphael Mitsch
571eaf6238
Update spacy/kb/kb.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-04-24 20:42:59 +02:00
Raphael Mitsch
49747697a2
Merge branch 'v4' into feature/docwise-generator-batching
...
# Conflicts:
# spacy/kb/kb.pyx
# spacy/ml/models/entity_linker.py
# spacy/pipeline/entity_linker.py
# website/docs/api/inmemorylookupkb.mdx
# website/docs/api/kb.mdx
2023-04-17 16:28:09 +02:00
Adriane Boyd
5d0f48fe69
Enforce that Span.start/end(_char) remain valid and in sync ( #12268 )
...
* Enforce that Span.start/end(_char) remain valid and in sync
Allowing span attributes to be writable starting in v3 has made it
possible for the internal `Span.start/end/start_char/end_char` to get
out-of-sync or have invalid values.
This checks that the values are valid and syncs the token and char
offsets if any attributes are modified directly. It does not yet handle
the case where the underlying doc is modified.
* Format
2023-04-06 16:01:59 +02:00
Daniël de Kok
b734e5314d
Avoid TrainablePipe.finish_update
getting called twice during training ( #12450 )
...
* Avoid `TrainablePipe.finish_update` getting called twice during training
PR #12136 fixed an issue where the tok2vec pipe was updated before
gradient were accumulated. However, it introduced a new bug that cause
`finish_update` to be called twice when using the training loop. This
causes a fairly large slowdown.
The `Language.update` method accepts the `sgd` argument for passing an
optimizer. This argument has three possible values:
- `Optimizer`: use the given optimizer to finish pipe updates.
- `None`: use a default optimizer to finish pipe updates.
- `False`: do not finish pipe updates.
However, the latter option was not documented and not valid with the
existing type of `sgd`. I assumed that this was a remnant of earlier
spaCy versions and removed handling of `False`.
However, with that change, we are passing `None` to `Language.update`.
As a result, we were calling `finish_update` in both `Language.update`
and in the training loop after all subbatches are processed.
This change restores proper handling/use of `False`. Moreover, the role
of `False` is now documented and added to the type to avoid future
accidents.
* Fix typo
* Document defaults for `Language.update`
2023-03-30 09:30:42 +02:00
Edward
a653dec654
Add info that Vocab and StringStore are not static in docs ( #12427 )
...
* Add size increase info about vocab and stringstore
* Update website/docs/api/stringstore.mdx
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* Update website/docs/api/vocab.mdx
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
* Change wording
---------
Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-03-27 09:18:23 +02:00
Raphael Mitsch
3102e2e27a
Entity linking: use SpanGroup
instead of Iterable[Span]
for mentions ( #12344 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span].
* Update docs.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Reverse erroneous changes during merge.
* Update return type in EL tests.
* Re-add Candidate to setup.py.
* Format updated docs.
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 12:25:18 +01:00
Raphael Mitsch
e5be5d6092
Merge branch 'v4' into feature/docwise-generator-batching
...
# Conflicts:
# spacy/kb/kb.pyx
# spacy/kb/kb_in_memory.pyx
# spacy/ml/models/entity_linker.py
# spacy/pipeline/entity_linker.py
# spacy/tests/pipeline/test_entity_linker.py
# website/docs/api/inmemorylookupkb.mdx
# website/docs/api/kb.mdx
2023-03-20 10:50:54 +01:00
Raphael Mitsch
cb79af3a10
Fix merge leftovers.
2023-03-20 10:31:11 +01:00
Raphael Mitsch
73bdeb01e4
Merge branch 'refactor/el-candidates' into feature/docwise-generator-batching
...
# Conflicts:
# spacy/kb/candidate.py
# spacy/kb/kb.pyx
# spacy/kb/kb_in_memory.pyx
# spacy/ml/models/entity_linker.py
# spacy/pipeline/entity_linker.py
# spacy/tests/pipeline/test_entity_linker.py
# website/docs/api/inmemorylookupkb.mdx
# website/docs/api/kb.mdx
2023-03-20 10:24:17 +01:00
Raphael Mitsch
9340eb8ad2
Introduce hierarchy for EL Candidate
objects ( #12341 )
...
* Convert Candidate from Cython to Python class.
* Format.
* Fix .entity_ typo in _add_activations() usage.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update doc string of BaseCandidate.__init__().
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.
* Adjust Candidate to support and mandate numerical entity IDs.
* Format.
* Fix docstring and docs.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename alias -> mention.
* Refactor Candidate attribute names. Update docs and tests accordingly.
* Refacor Candidate attributes and their usage.
* Format.
* Fix mypy error.
* Update error code in line with v4 convention.
* Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Updated error code.
* Simplify interface for int/str representations.
* Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Rename 'alias' to 'mention'.
* Port Candidate and InMemoryCandidate to Cython.
* Remove redundant entry in setup.py.
* Add abstract class check.
* Drop storing mention.
* Update spacy/kb/candidate.pxd
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Fix entity_id refactoring problems in docstrings.
* Drop unused InMemoryCandidate._entity_hash.
* Update docstrings.
* Move attributes out of Candidate.
* Partially fix alias/mention terminology usage. Convert Candidate to interface.
* Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
* Update docstrings related to prior_prob.
* Update alias/mention usage in doc(strings).
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
* Update docstrings.
* Fix InMemoryCandidate attribute names.
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/ml/models/entity_linker.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update W401 test.
* Update spacy/errors.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update spacy/kb/kb.pyx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Use Candidate output type for toy generators in the test suite to mimick best practices
* fix docs
* fix import
---------
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-20 00:34:35 +01:00
Adriane Boyd
6ae7618418
Clean up Vocab constructor ( #12290 )
...
* Clean up Vocab constructor
* Change effective type of `strings` from `Iterable[str]` to `Optional[StringStore]`
* Don't automatically add strings to vocab
* Change default values to `None`
* Remove `**deprecated_kwargs`
* Format
2023-03-19 23:41:20 +01:00
Sofie Van Landeghem
b83407388a
fix import
2023-03-19 23:34:00 +01:00
Sofie Van Landeghem
0365d3d2e2
fix docs
2023-03-19 23:31:02 +01:00
Sofie Van Landeghem
9e71adc074
Use Candidate output type for toy generators in the test suite to mimick best practices
2023-03-19 23:27:20 +01:00
Raphael Mitsch
faede7155c
Update spacy/kb/kb.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 11:32:41 +01:00
Raphael Mitsch
4d8dce5ba2
Update spacy/errors.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 11:28:18 +01:00
Raphael Mitsch
2377b67f81
Update W401 test.
2023-03-17 08:59:52 +01:00
Raphael Mitsch
307bbab285
Update spacy/ml/models/entity_linker.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 08:58:28 +01:00
Raphael Mitsch
978fbdcee1
Update spacy/kb/kb.pyx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-17 08:58:17 +01:00
Raphael Mitsch
830939ee64
Fix InMemoryCandidate attribute names.
2023-03-15 10:51:34 +01:00
Raphael Mitsch
80fb0666b9
Update docstrings.
2023-03-15 09:25:41 +01:00
Raphael Mitsch
3cfc1c6acc
Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.
2023-03-15 09:23:31 +01:00
Raphael Mitsch
961795d9f1
Update spacy/ml/models/entity_linker.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-15 09:20:25 +01:00
Raphael Mitsch
b7b4282821
Update spacy/ml/models/entity_linker.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-15 09:20:07 +01:00
Raphael Mitsch
28dbed64cb
Update alias/mention usage in doc(strings).
2023-03-14 13:33:05 +01:00
Raphael Mitsch
be858981e6
Update docstrings related to prior_prob.
2023-03-13 17:01:20 +01:00
Raphael Mitsch
4a921766f1
Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs().
2023-03-13 16:54:38 +01:00
Raphael Mitsch
6adc15178f
Partially fix alias/mention terminology usage. Convert Candidate to interface.
2023-03-13 14:26:14 +01:00
Raphael Mitsch
649c146e2c
Move attributes out of Candidate.
2023-03-13 09:21:08 +01:00
Raphael Mitsch
ce23942320
Merge branch 'refactor/el-candidates' of github.com:rmitsch/spaCy into refactor/el-candidates
2023-03-10 09:04:10 +01:00
Raphael Mitsch
348dd1c87e
Update docstrings.
2023-03-10 09:03:41 +01:00
Raphael Mitsch
27053912da
Drop unused InMemoryCandidate._entity_hash.
2023-03-10 09:00:30 +01:00
Raphael Mitsch
6fc7997c06
Fix entity_id refactoring problems in docstrings.
2023-03-10 08:55:32 +01:00
Raphael Mitsch
34e092e4e5
Update spacy/kb/candidate.pxd
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-09 16:15:39 +01:00
Raphael Mitsch
c61654eef8
Drop storing mention.
2023-03-09 15:04:10 +01:00
Raphael Mitsch
b0ee34185d
Add abstract class check.
2023-03-09 14:56:44 +01:00
Raphael Mitsch
845864beb4
Remove redundant entry in setup.py.
2023-03-09 14:55:10 +01:00
Raphael Mitsch
b476041417
Port Candidate and InMemoryCandidate to Cython.
2023-03-09 14:44:41 +01:00
Raphael Mitsch
1c937db3af
Rename 'alias' to 'mention'.
2023-03-09 12:06:15 +01:00
Raphael Mitsch
1ba2fc4207
Update website/docs/api/kb.mdx
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-09 12:01:42 +01:00
Madeesh Kannan
520279ff7c
Tok2Vec
: Add distill
method (#12108 )
...
* `Tok2Vec`: Add `distill` method
* `Tok2Vec`: Refactor `update`
* Add `Tok2Vec.distill` test
* Update `distill` signature to accept `Example`s instead of separate teacher and student docs
* Add docs
* Remove docstring
* Update test
* Remove `update` calls from test
* Update `Tok2Vec.distill` docstring
2023-03-09 09:37:19 +01:00
Raphael Mitsch
cea58ade89
Simplify interface for int/str representations.
2023-03-07 14:35:38 +01:00
Raphael Mitsch
0c63940407
Merge branch 'v4' into refactor/el-candidates
...
# Conflicts:
# spacy/errors.py
2023-03-07 14:00:23 +01:00
Raphael Mitsch
f8a02f7fef
Updated error code.
2023-03-07 13:58:42 +01:00
Raphael Mitsch
082992aebb
Update spacy/kb/candidate.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-07 13:54:11 +01:00