Commit Graph

9347 Commits

Author SHA1 Message Date
Adriane Boyd
5b11c4b5de
Merge branch 'master' into add/exclusive-spancat 2023-03-08 09:50:10 +01:00
Paul O'Leary McCann
e656189ec3
Change GPU efficient textcat to use CNN, not BOW in generated configs (#11900)
* Change GPU efficient textcat to use CNN, not BOW

If you generate a config with a textcat component using GPU
(transformers), the defaut option (efficiency) uses a BOW architecture,
which does not use tok2vec features. While that can make sense as part
of a larger pipeline, in the case of just a transformer and a textcat,
that means the transformer is doing a lot of work for no purpose.

This changes it so that the CNN architecture is used instead. It could
also be changed to be the same as the accuracy config, which uses the
ensemble architecture.

* Add the transformer when using a textcat with GPU

* Switch ubuntu-latest to ubuntu-20.04 in main tests (#11928)

* Switch ubuntu-latest to ubuntu-20.04 in main tests

* Only use 20.04 for 3.6

* Require thinc v8.1.7

* Require thinc v8.1.8

* Break up longer expression

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-07 17:47:45 +01:00
Sofie Van Landeghem
3bf4539e31
fix types (#12365) 2023-03-07 13:29:08 +01:00
Adriane Boyd
260cb9c6fe
Raise error for non-default vectors with PretrainVectors (#12366) 2023-03-06 18:06:31 +01:00
Adriane Boyd
5ecb3babed
Update to use absolute imports in tests (#12372) 2023-03-06 17:30:17 +01:00
kadarakos
308002f00d
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-06 15:14:55 +01:00
kadarakos
51a53de239
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-06 15:14:35 +01:00
Adriane Boyd
0bbc620dd8
Partially work around pending deprecation of pkg_resources (#12368)
* Handle deprecation of pkg_resources

* Replace `pkg_resources` with `importlib_metadata` for `spacy info
--url`
* Remove requirements check from `spacy project` given the lack of
alternatives

* Fix installed model URL method and CI test

* Fix types/handling, simplify catch-all return

* Move imports instead of disabling requirements check

* Format

* Reenable test with ignored deprecation warning

* Fix except

* Fix return
2023-03-06 14:48:57 +01:00
kadarakos
854d1614a9 Intervals to private and document 'name' param 2023-03-03 15:51:57 +00:00
kadarakos
97fd9741c6 don't keep recomputing self._label_map for each span 2023-03-03 15:41:41 +00:00
kadarakos
c7e7343999
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:56:00 +01:00
kadarakos
fded200128
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:55:42 +01:00
kadarakos
b972328337
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:55:15 +01:00
kadarakos
0a74e8c260
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-03-03 15:54:54 +01:00
Raphael Mitsch
6aa6b86d49
Make generation of empty KnowledgeBase instances configurable in EntityLinker (#12320)
* Make empty_kb() configurable.

* Format.

* Update docs.

* Be more specific in KB serialization test.

* Update KB serialization tests. Update docs.

* Remove doc update for batched candidate generation.

* Fix serialization of subclassed KB in tests.

* Format.

* Update docstring.

* Update docstring.

* Switch from pickle to json for custom field serialization.
2023-03-01 16:02:55 +01:00
Adriane Boyd
6182213fef
Merge branch 'master' into add/exclusive-spancat 2023-03-01 15:51:16 +01:00
Sofie Van Landeghem
74cae47bf6
rely on is_empty property instead of __len__ (#12347) 2023-03-01 12:06:07 +01:00
Adriane Boyd
8f058e39bd
Fix error message for displacy auto_select_port (#12343) 2023-02-28 16:36:03 +01:00
TAN Long
071667376a
Add new REL_OPs: >+, >-, <+, and <- (#12334)
* Add immediate left/right child/parent dependency relations

* Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`.

---------

Co-authored-by: Tan Long <tanloong@foxmail.com>
2023-02-28 14:36:33 +01:00
lise-brinck
e2de188cf1
Bugfix/swedish tokenizer (#12315)
* add unittest for explosion#12311

* create punctuation.py for swedish

* removed : from infixes in swedish punctuation.py

* allow : as infix if succeeding char is uppercase
2023-02-27 10:53:45 +01:00
Kevin Humphreys
acdd993071
Matcher performance fix for extension predicates: use shared key function (#12272)
* standardize predicate key format

* single key function

* Make optional args in key function keyword-only

---------

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-27 08:35:08 +01:00
Paul O'Leary McCann
1e8bac99f3
Add tests for projects to master (#12303)
* Add tests for projects to master

* Fix git clone related issues on Windows

* Add stat import
2023-02-23 10:22:57 +01:00
kadarakos
86d3e78c64 make label mapper private 2023-02-20 17:02:27 +00:00
kadarakos
813b3551ed Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into spancat-exclusive 2023-02-20 10:52:34 +00:00
kadarakos
6f3b257cf4 raise error instead of just print 2023-02-20 10:48:41 +00:00
kadarakos
43d5cab2c2
Update spacy/pipeline/spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-20 11:37:51 +01:00
kadarakos
e847487ebb remove duplicate declaration 2023-02-20 10:36:54 +00:00
kadarakos
af3fa670d4
Update spacy/tests/pipeline/test_spancat.py
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2023-02-20 11:36:32 +01:00
Adriane Boyd
80bc140533
Add grc to langs with lexeme norms in spacy-lookups-data (#12287) 2023-02-16 17:57:02 +01:00
Edward
61b8454137
Adjust return type of registry.find (#12227)
* Fix registry find return type

* add dot

* Add type ignore for mypy

* update black formatting version

* add mypy ignore to package cli

* mypy type fix (for real)

* Update find description in spacy/util.py

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>

* adjust mypy directive

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-02-15 12:32:53 +01:00
kadarakos
afc3a5a4af black 2023-02-10 14:07:39 +00:00
kadarakos
a07aafc28e refactor make_span_group 2023-02-10 14:06:56 +00:00
kadarakos
a281a7c9a1 tests for make_span_group with negative labels 2023-02-10 14:06:07 +00:00
kadarakos
b98cba2bef black 2023-02-08 19:45:01 +00:00
kadarakos
43162029bc bugfix 2023-02-08 19:43:51 +00:00
kadarakos
ec941a128d single label make_spangroup test 2023-02-08 19:43:33 +00:00
kadarakos
6fc25f64dd add spans.attrs[scores] 2023-02-07 18:12:32 +00:00
kadarakos
afc3ce1c7e logical bug in configuration check 2023-02-06 19:05:35 +00:00
kadarakos
5c927effde mypy 2023-02-06 19:03:33 +00:00
kadarakos
c24b3785a6 replace single_label with add_negative_label and adjust inference 2023-02-06 18:54:30 +00:00
kadarakos
c864f12e28 remove spancat exclusive 2023-02-06 10:15:53 +00:00
kadarakos
b8cdcfb2f5 black 2023-02-02 15:23:05 +00:00
kadarakos
d13e494abd don't rely on default arguments 2023-02-02 10:36:36 +00:00
Sofie Van Landeghem
79ef6cf0f9
Have logging calls use string formatting types (#12215)
* change logging call for spacy.LookupsDataLoader.v1

* substitutions in language and _util

* various more substitutions

* add string formatting guidelines to contribution guidelines
2023-02-02 11:15:22 +01:00
kadarakos
5ccb154972 more docstring and fix negative_label 2023-02-01 11:16:34 +00:00
kadarakos
edf9134e45 add docstrings 2023-01-31 17:06:20 +00:00
kadarakos
079f09b97c black 2023-01-31 16:33:06 +00:00
kadarakos
8a807ef1dd black 2023-01-31 16:30:12 +00:00
kadarakos
dceeb02b94 wire up different make_spangroups for single and multilabel 2023-01-31 16:27:26 +00:00
kadarakos
52e7324df4 Merge branch 'master' into spancat-exclusive 2023-01-31 16:05:08 +00:00