Commit Graph

16173 Commits

Author SHA1 Message Date
Madeesh Kannan
520279ff7c
Tok2Vec: Add distill method (#12108)
* `Tok2Vec`: Add `distill` method

* `Tok2Vec`: Refactor `update`

* Add `Tok2Vec.distill` test

* Update `distill` signature to accept `Example`s instead of separate teacher and student docs

* Add docs

* Remove docstring

* Update test

* Remove `update` calls from test

* Update `Tok2Vec.distill` docstring
2023-03-09 09:37:19 +01:00
Marcus Blättermann
b309336712
Make sure to run Python setup before NPM dev mode (#12384) 2023-03-08 11:59:10 +01:00
Paul O'Leary McCann
e656189ec3
Change GPU efficient textcat to use CNN, not BOW in generated configs (#11900)
* Change GPU efficient textcat to use CNN, not BOW

If you generate a config with a textcat component using GPU
(transformers), the defaut option (efficiency) uses a BOW architecture,
which does not use tok2vec features. While that can make sense as part
of a larger pipeline, in the case of just a transformer and a textcat,
that means the transformer is doing a lot of work for no purpose.

This changes it so that the CNN architecture is used instead. It could
also be changed to be the same as the accuracy config, which uses the
ensemble architecture.

* Add the transformer when using a textcat with GPU

* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)

* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6

* Require thinc v8.1.7

* Require thinc v8.1.8

* Break up longer expression

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-07 17:47:45 +01:00
Raphael Mitsch
cea58ade89 Simplify interface for int/str representations. 2023-03-07 14:35:38 +01:00
Raphael Mitsch
0c63940407 Merge branch 'v4' into refactor/el-candidates
# Conflicts:
#	spacy/errors.py
2023-03-07 14:00:23 +01:00
Raphael Mitsch
f8a02f7fef Updated error code. 2023-03-07 13:58:42 +01:00
Raphael Mitsch
082992aebb
Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-07 13:54:11 +01:00
Sofie Van Landeghem
3bf4539e31
fix types (#12365) 2023-03-07 13:29:08 +01:00
Raphael Mitsch
41b3a0d932
Drop support for EntityLinker_v1. (#12377) 2023-03-07 13:10:45 +01:00
Raphael Mitsch
8dbb74c9c0 Merge branch 'v4' into refactor/el-candidates 2023-03-07 09:06:51 +01:00
Raphael Mitsch
4bdb359711 Merge branch 'v4' into feature/docwise-generator-batching 2023-03-07 09:06:10 +01:00
Adriane Boyd
260cb9c6fe
Raise error for non-default vectors with PretrainVectors (#12366) 2023-03-06 18:06:31 +01:00
Adriane Boyd
5ecb3babed
Update to use absolute imports in tests (#12372) 2023-03-06 17:30:17 +01:00
Adriane Boyd
8ca71f9591
Merge pull request #12371 from rmitsch/sync/master-into-v4
Sync `v4` with latest from `master`
2023-03-06 17:10:19 +01:00
Raphael Mitsch
749e446ee3 Merge branch 'master' into sync/master-into-v4
# Conflicts:
#	.github/azure-steps.yml
2023-03-06 16:27:56 +01:00
Adriane Boyd
0bbc620dd8
Partially work around pending deprecation of pkg_resources (#12368)
* Handle deprecation of pkg_resources

* Replace `pkg_resources` with `importlib_metadata` for `spacy info
--url`
* Remove requirements check from `spacy project` given the lack of
alternatives

* Fix installed model URL method and CI test

* Fix types/handling, simplify catch-all return

* Move imports instead of disabling requirements check

* Format

* Reenable test with ignored deprecation warning

* Fix except

* Fix return
2023-03-06 14:48:57 +01:00
Raphael Mitsch
d0abc321d8 Format. 2023-03-06 10:27:33 +01:00
Raphael Mitsch
8b24f31b65 Format docs. 2023-03-06 10:21:37 +01:00
Raphael Mitsch
f33f0ed160 Merge branch 'v4' into feature/docwise-generator-batching
# Conflicts:
#	spacy/pipeline/entity_linker.py
#	website/docs/api/entitylinker.mdx
2023-03-06 10:21:12 +01:00
Raphael Mitsch
e4e55b88b3 Update leftover get_candidates() mention in docs. 2023-03-06 10:08:10 +01:00
Raphael Mitsch
bb7418ebdd Modify EL batching system. 2023-03-06 10:05:46 +01:00
Raphael Mitsch
2ac586fdb5 Update error code in line with v4 convention. 2023-03-05 14:43:32 +01:00
Raphael Mitsch
670e1ca7c5 Fix mypy error. 2023-03-05 14:33:32 +01:00
Raphael Mitsch
5f40b3e523 Format. 2023-03-05 14:14:16 +01:00
Raphael Mitsch
38dce966e5 Refacor Candidate attributes and their usage. 2023-03-05 13:49:13 +01:00
Raphael Mitsch
94e57d0ed5 Refactor Candidate attribute names. Update docs and tests accordingly. 2023-03-03 11:08:17 +01:00
Raphael Mitsch
46fe069f87 Rename alias -> mention. 2023-03-03 10:29:53 +01:00
Raphael Mitsch
61bacf81bd
Update website/docs/api/kb.mdx
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-03 09:54:28 +01:00
Sofie Van Landeghem
04f41854c1
Merge pull request #12356 from rmitsch/sync/master-into-v4
Sync `v4` with latest from `master`
2023-03-03 09:31:45 +01:00
Raphael Mitsch
3beda2b23a Merge branch 'refactor/el-candidates' into refactor/span-group-for-mentions
# Conflicts:
#	spacy/ml/models/entity_linker.py
#	website/docs/api/inmemorylookupkb.mdx
2023-03-03 08:32:38 +01:00
Raphael Mitsch
1ea31552be Merge branch 'master' into sync/master-into-v4
# Conflicts:
#	requirements.txt
#	spacy/pipeline/entity_linker.py
#	spacy/util.py
#	website/docs/api/entitylinker.mdx
2023-03-02 16:24:15 +01:00
Raphael Mitsch
6aa6b86d49
Make generation of empty KnowledgeBase instances configurable in EntityLinker (#12320)
* Make empty_kb() configurable.

* Format.

* Update docs.

* Be more specific in KB serialization test.

* Update KB serialization tests. Update docs.

* Remove doc update for batched candidate generation.

* Fix serialization of subclassed KB in tests.

* Format.

* Update docstring.

* Update docstring.

* Switch from pickle to json for custom field serialization.
2023-03-01 16:02:55 +01:00
Adriane Boyd
da75896ef5
Return Tuple[Span] for all Doc/Span attrs that provide spans (#12288)
* Return Tuple[Span] for all Doc/Span attrs that provide spans

* Update Span types
2023-03-01 16:00:02 +01:00
kadarakos
56aa0cc75f
Displacy doc fix (#12352)
* more details for color setting

* more details for color setting

* prettier
2023-03-01 15:38:23 +01:00
Raphael Mitsch
9bd498cdae Fix docstring and docs. 2023-03-01 15:09:24 +01:00
Raphael Mitsch
257bca3959 Format. 2023-03-01 14:54:03 +01:00
Raphael Mitsch
fa390618c8 Adjust Candidate to support and mandate numerical entity IDs. 2023-03-01 14:50:58 +01:00
Raphael Mitsch
49abf4fb3a Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate. 2023-03-01 14:27:50 +01:00
Raphael Mitsch
417e8fea8b
Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-01 13:51:33 +01:00
Raphael Mitsch
21fa22de08 Merge branch 'refactor/el-candidates' of github.com:rmitsch/spaCy into refactor/el-candidates 2023-03-01 13:48:46 +01:00
Raphael Mitsch
3da0712582 Update doc string of BaseCandidate.__init__(). 2023-03-01 13:15:38 +01:00
Raphael Mitsch
0680958476
Update spacy/kb/candidate.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2023-03-01 12:42:08 +01:00
Sofie Van Landeghem
74cae47bf6
rely on is_empty property instead of __len__ (#12347) 2023-03-01 12:06:07 +01:00
Raphael Mitsch
efbc3d37b3
Update docs w.r.t. spacy.CandidateBatchGenerator.v1. (#12350) 2023-03-01 11:01:35 +01:00
Adriane Boyd
33864f1d07
Add new tags in docs for #12334 (#12348) 2023-03-01 10:46:13 +01:00
Adriane Boyd
8f058e39bd
Fix error message for displacy auto_select_port (#12343) 2023-02-28 16:36:03 +01:00
Raphael Mitsch
50b34751eb Update docs. 2023-02-28 15:38:28 +01:00
Raphael Mitsch
8596fb8b88 Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span]. 2023-02-28 15:28:05 +01:00
TAN Long
071667376a
Add new REL_OPs: >+, >-, <+, and <- (#12334)
* Add immediate left/right child/parent dependency relations

* Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`.

---------

Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-02-28 14:36:33 +01:00
Raphael Mitsch
a97ef65b33 Fix .entity_ typo in _add_activations() usage. 2023-02-28 14:22:27 +01:00