Commit Graph

15956 Commits

Author SHA1 Message Date
Adriane Boyd
2713890ecc
Update website/docs/api/spancategorizer.mdx 2023-03-08 10:59:24 +01:00
Adriane Boyd
f53d945b2d
Fix merge 2023-03-08 10:22:23 +01:00
Adriane Boyd
5b11c4b5de
Merge branch 'master' into add/exclusive-spancat 2023-03-08 09:50:10 +01:00
Paul O'Leary McCann
e656189ec3
Change GPU efficient textcat to use CNN, not BOW in generated configs (#11900)
* Change GPU efficient textcat to use CNN, not BOW

If you generate a config with a textcat component using GPU
(transformers), the defaut option (efficiency) uses a BOW architecture,
which does not use tok2vec features. While that can make sense as part
of a larger pipeline, in the case of just a transformer and a textcat,
that means the transformer is doing a lot of work for no purpose.

This changes it so that the CNN architecture is used instead. It could
also be changed to be the same as the accuracy config, which uses the
ensemble architecture.

* Add the transformer when using a textcat with GPU

* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)

* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6

* Require thinc v8.1.7

* Require thinc v8.1.8

* Break up longer expression

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-07 17:47:45 +01:00
kadarakos
efea793248 prettier 2023-03-07 16:15:15 +00:00
kadarakos
086bb7f544
Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-07 17:13:53 +01:00
kadarakos
75b9819553
Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-07 17:13:31 +01:00
kadarakos
aaf119b949 revert 2023-03-07 14:21:55 +00:00
kadarakos
a09389f1a5 revert 2023-03-07 14:20:51 +00:00
kadarakos
a4ef74468e revert 2023-03-07 14:19:49 +00:00
kadarakos
6587ac5324 revert 2023-03-07 14:19:06 +00:00
kadarakos
641609070f replace tags 2023-03-07 14:15:27 +00:00
Sofie Van Landeghem
3bf4539e31
fix types (#12365) 2023-03-07 13:29:08 +01:00
kadarakos
91e0c3d782 Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into exclusive-spancat 2023-03-07 09:24:02 +00:00
kadarakos
a433d91d42 add Tag to new features 2023-03-07 09:23:43 +00:00
Adriane Boyd
260cb9c6fe
Raise error for non-default vectors with PretrainVectors (#12366) 2023-03-06 18:06:31 +01:00
Adriane Boyd
5ecb3babed
Update to use absolute imports in tests (#12372) 2023-03-06 17:30:17 +01:00
kadarakos
308002f00d
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-06 15:14:55 +01:00
kadarakos
51a53de239
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-06 15:14:35 +01:00
Adriane Boyd
0bbc620dd8
Partially work around pending deprecation of pkg_resources (#12368)
* Handle deprecation of pkg_resources

* Replace `pkg_resources` with `importlib_metadata` for `spacy info
--url`
* Remove requirements check from `spacy project` given the lack of
alternatives

* Fix installed model URL method and CI test

* Fix types/handling, simplify catch-all return

* Move imports instead of disabling requirements check

* Format

* Reenable test with ignored deprecation warning

* Fix except

* Fix return
2023-03-06 14:48:57 +01:00
kadarakos
854d1614a9 Intervals to private and document 'name' param 2023-03-03 15:51:57 +00:00
kadarakos
6d67ab7670 typo in docs 2023-03-03 15:43:59 +00:00
kadarakos
97fd9741c6 don't keep recomputing self._label_map for each span 2023-03-03 15:41:41 +00:00
kadarakos
c7e7343999
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:56:00 +01:00
kadarakos
fded200128
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:55:42 +01:00
kadarakos
b972328337
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:55:15 +01:00
kadarakos
0a74e8c260
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:54:54 +01:00
kadarakos
acb927b79c
Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:54:23 +01:00
kadarakos
61576b50c7 Update website/docs/api/spancategorizer.mdx
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:54:11 +01:00
Raphael Mitsch
6aa6b86d49
Make generation of empty KnowledgeBase instances configurable in EntityLinker (#12320)
* Make empty_kb() configurable.

* Format.

* Update docs.

* Be more specific in KB serialization test.

* Update KB serialization tests. Update docs.

* Remove doc update for batched candidate generation.

* Fix serialization of subclassed KB in tests.

* Format.

* Update docstring.

* Update docstring.

* Switch from pickle to json for custom field serialization.
2023-03-01 16:02:55 +01:00
Adriane Boyd
6182213fef
Merge branch 'master' into add/exclusive-spancat 2023-03-01 15:51:16 +01:00
kadarakos
56aa0cc75f
Displacy doc fix (#12352)
* more details for color setting

* more details for color setting

* prettier
2023-03-01 15:38:23 +01:00
Sofie Van Landeghem
74cae47bf6
rely on is_empty property instead of __len__ (#12347) 2023-03-01 12:06:07 +01:00
Raphael Mitsch
efbc3d37b3
Update docs w.r.t. spacy.CandidateBatchGenerator.v1. (#12350) 2023-03-01 11:01:35 +01:00
Adriane Boyd
33864f1d07
Add new tags in docs for #12334 (#12348) 2023-03-01 10:46:13 +01:00
Adriane Boyd
8f058e39bd
Fix error message for displacy auto_select_port (#12343) 2023-02-28 16:36:03 +01:00
TAN Long
071667376a
Add new REL_OPs: >+, >-, <+, and <- (#12334)
* Add immediate left/right child/parent dependency relations

* Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`.

---------

Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-02-28 14:36:33 +01:00
lise-brinck
e2de188cf1
Bugfix/swedish tokenizer (#12315)
* add unittest for explosion#12311

* create punctuation.py for swedish

* removed : from infixes in swedish punctuation.py

* allow : as infix if succeeding char is uppercase
2023-02-27 10:53:45 +01:00
Adriane Boyd
4539fbae17
Revert "Fix FUZZY operator definition (#12318)" (#12336)
This reverts commit daedc45d05.

The default length depends on the length of the pattern string and was
correct for this example.
2023-02-27 09:48:36 +01:00
Kevin Humphreys
acdd993071
Matcher performance fix for extension predicates: use shared key function (#12272)
* standardize predicate key format

* single key function

* Make optional args in key function keyword-only

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-27 08:35:08 +01:00
Paul O'Leary McCann
1e8bac99f3
Add tests for projects to master (#12303)
* Add tests for projects to master

* Fix git clone related issues on Windows

* Add stat import
2023-02-23 10:22:57 +01:00
andyjessen
daedc45d05
Fix FUZZY operator definition (#12318)
* Fix FUZZY operator definition

The default length of the FUZZY operator is 2 and not 3.

* adjust edit distance in matcher usage docs too

---------

Co-authored-by: svlandeg <svlandeg@github.com>
2023-02-23 09:37:40 +01:00
kadarakos
413ca22587 run prettier 2023-02-20 17:05:50 +00:00
kadarakos
6e5e77ea79 update docs 2023-02-20 17:03:41 +00:00
kadarakos
86d3e78c64 make label mapper private 2023-02-20 17:02:27 +00:00
kadarakos
813b3551ed Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into spancat-exclusive 2023-02-20 10:52:34 +00:00
kadarakos
6f3b257cf4 raise error instead of just print 2023-02-20 10:48:41 +00:00
kadarakos
43d5cab2c2
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-20 11:37:51 +01:00
kadarakos
e847487ebb remove duplicate declaration 2023-02-20 10:36:54 +00:00
kadarakos
af3fa670d4
Update spacy/tests/pipeline/test_spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-20 11:36:32 +01:00