Commit Graph

14641 Commits

Author SHA1 Message Date
Sofie Van Landeghem
3daf57d70c
Small spancat fixes (#8614)
* two small fixes + additional tests

* rename
2021-07-06 14:15:41 +02:00
Ines Montani
327f83573a
Move scores per type handling into util function (#8590) 2021-07-06 13:02:37 +02:00
Adriane Boyd
5fd0b5207e
Fix vectors check for sourced components (#8559)
* Fix vectors check for sourced components

Since vectors are not loaded when components are sourced, store a hash
for the vectors of each sourced component and compare it to the loaded
vectors after the vectors are loaded from the `[initialize]` block.

* Pop temporary info

* Remove stored hash in remove_pipe

* Add default for pop

* Add additional convert/debug/assemble CLI tests
2021-07-06 12:43:17 +02:00
Adriane Boyd
29906884c5
Raise an error for textcat with <2 labels (#8584)
* Raise an error for textcat with <2 labels

Raise an error if initializing a `textcat` component without at least
two labels.

* Add similar note to docs

* Update positive_label description in API docs
2021-07-06 12:35:22 +02:00
Paul O'Leary McCann
3b1d5350d0
Merge pull request #8609 from mathcass/model-documentation-typo
Fix a command typo in models.md
2021-07-06 14:43:58 +09:00
Cass
7d13fc799b
Fix a command typo in models.md
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00
Ines Montani
8423864b50
Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597) 2021-07-05 13:49:20 +02:00
Ines Montani
15108cd930
Merge pull request #8593 from yohasebe/patch-1 [ci skip] 2021-07-05 11:31:38 +10:00
Ines Montani
fdcd4003e5
Merge pull request #8592 from yohasebe/patch-2 [ci skip]
Adds contributor agreement yohasebe.md
2021-07-05 11:27:43 +10:00
Yoichiro Hasebe
596e04cbb4
Github repo info fixed for ruby-spacy 2021-07-04 18:55:17 +09:00
Yoichiro Hasebe
e541092088
Create yohasebe.md 2021-07-04 08:57:04 +09:00
Yoichiro Hasebe
2bdfa42107
Update universe.json 2021-07-04 08:44:39 +09:00
Ines Montani
3dcb747980
Merge pull request #8580 from explosion/autoblack
Auto-format code with black
2021-07-03 13:15:07 +10:00
explosion-bot
ee37288a1f Auto-format code with black 2021-07-02 07:48:26 +00:00
Ines Montani
c5c4e96597 Fix syntax [ci skip] 2021-07-02 17:46:56 +10:00
Ines Montani
6b905d67df Try workflow_dispatch and schedule [ci skip] 2021-07-02 17:45:27 +10:00
Ines Montani
70589e348e Commit as explosion-bot [ci skip] 2021-07-02 17:45:11 +10:00
Ines Montani
dd34a3a433 Try simpler approach [ci skip] 2021-07-02 17:40:49 +10:00
Ines Montani
2898331494 Improve logic [ci skip] 2021-07-02 17:37:35 +10:00
Ines Montani
519a9e29be Fix git login [ci skip] 2021-07-02 17:30:59 +10:00
Ines Montani
8961f36415 Commit manually in workflow [ci skip] 2021-07-02 17:27:48 +10:00
Ines Montani
2a5cbf1b0c Test different workflow trigger [ci skip] 2021-07-02 17:22:43 +10:00
Ines Montani
bbbaae0b5e Update triggers [ci skip] 2021-07-02 17:10:24 +10:00
Ines Montani
cdefb8cf1b Experimental: add autoblack.yml action [ci skip] 2021-07-02 17:07:05 +10:00
Adriane Boyd
2fc67e2aeb
Require thinc >=8.0.7 (#8572) 2021-07-01 16:55:09 +02:00
Ines Montani
af9d984407
Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip] 2021-06-30 20:52:59 +10:00
Adriane Boyd
2b8c679a3d
Fix duplicate spacy package CLI opts (#8551)
Use `-c` for `--code` and not additionally for `--create-meta`, in line
with the docs.
2021-06-30 11:23:26 +02:00
Nick Sorros
bb781ae7f7
Remove extra parenthesis from the example for spacy-streamlit (#8527) 2021-06-28 14:03:31 +02:00
Ines Montani
7f65902702
Merge pull request #8522 from adrianeboyd/chore/update-flake8
Update flake8 version in reqs and CI
2021-06-28 21:46:06 +10:00
Ines Montani
8bc235dcc0
Merge pull request #8523 from adrianeboyd/chore/cleanup-v3.1.0 2021-06-28 21:45:38 +10:00
Adriane Boyd
86d01e9229 Tidy up with flake8: imports, comparisons, etc. 2021-06-28 12:08:15 +02:00
Adriane Boyd
4d1ef8f695 Tidy up docs 2021-06-28 12:08:15 +02:00
Adriane Boyd
5eeb25f043 Tidy up code 2021-06-28 12:08:15 +02:00
Adriane Boyd
4b0ed73ed4 Update flake8 version in reqs and CI
* Update some unneeded forward refs related to flake8 checks
2021-06-28 11:29:36 +02:00
Ines Montani
93572dc12a
Merge pull request #8505 from bryant1410/patch-2 [ci skip]
Fix double slash in model release web page
2021-06-28 12:51:06 +10:00
Ines Montani
88ad41316c
Update issue template [ci skip] 2021-06-28 03:11:37 +02:00
Ines Montani
db6361ab6e
Update issue template [ci skip] 2021-06-28 03:10:52 +02:00
Ines Montani
2e453bda92
Update issue links [ci skip] 2021-06-28 03:09:48 +02:00
Ines Montani
bd510fcbf0
Merge pull request #8514 from polm/feature/github-discussions [ci skip] 2021-06-28 11:00:28 +10:00
Paul O'Leary McCann
0d3caa52a6 Update New Issue choices
This uses some new features related to Issue Templates to help direct
more people to Discussions.

1. Change the Discussions option to link to Discussions
2. Add a link to the FAQ
3. Disable blank issues
2021-06-27 14:41:33 +09:00
Paul O'Leary McCann
75569f723a
Merge pull request #8512 from kevinlu1248/master
Updated PyATE syntax to fit spaCy V3 in spaCy universe
2021-06-27 13:56:17 +09:00
Paul O'Leary McCann
f144888793
Merge pull request #8504 from bryant1410/patch-1
Fix typo in comment
2021-06-27 13:51:19 +09:00
Paul O'Leary McCann
894caab475
Merge pull request #8507 from bryant1410/patch-3
Fix typo in `train_cli` docstring
2021-06-27 13:50:48 +09:00
Kevin
1a3e7cc5ef Updated PyATE syntax to fit spaCy V3 2021-06-26 17:52:41 -07:00
Santiago Castro
ee63b2b199
Fix typo in train_cli docstring 2021-06-25 22:45:03 -07:00
Santiago Castro
2e71944e1e
Fix double slash in model release web page 2021-06-25 19:19:10 -07:00
Santiago Castro
a2bc743e47
Fix typo in comment 2021-06-25 18:58:38 -07:00
Adrian Zuber
f5aee0bbdf
Raise custom error in EntityLinker when KB is not set (#8442)
* Raise custom error in EntityLinker when KB is not set

* add contributor agreement

* Update E1018 error message
2021-06-25 23:04:00 +02:00
Matthew Honnibal
f9946154d9
Add SpanCategorizer component (#6747)
* Draft spancat model

* Add spancat model

* Add test for extract_spans

* Add extract_spans layer

* Upd extract_spans

* Add spancat model

* Add test for spancat model

* Upd spancat model

* Update spancat component

* Upd spancat

* Update spancat model

* Add quick spancat test

* Import SpanCategorizer

* Fix SpanCategorizer component

* Import SpanGroup

* Fix span extraction

* Fix import

* Fix import

* Upd model

* Update spancat models

* Add scoring, update defaults

* Update and add docs

* Fix type

* Update spacy/ml/extract_spans.py

* Auto-format and fix import

* Fix comment

* Fix type

* Fix type

* Update website/docs/api/spancategorizer.md

* Fix comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Better defense

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Fix labels list

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/ml/extract_spans.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/pipeline/spancat.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Set annotations during update

* Set annotations in spancat

* fix imports in test

* Update spacy/pipeline/spancat.py

* replace MaxoutLogistic with LinearLogistic

* fix config

* various small fixes

* remove set_annotations parameter in update

* use our beloved tupley format with recent support for doc.spans

* bugfix to allow renaming the default span_key (scores weren't showing up)

* use different key in docs example

* change defaults to better-working parameters from project (WIP)

* register spacy.extract_spans.v1 for legacy purposes

* Upd dev version so can build wheel

* layers instead of architectures for smaller building blocks

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Update website/docs/api/spancategorizer.md

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Include additional scores from overrides in combined score weights

* Parameterize spans key in scoring

Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so
that it's possible to evaluate multiple `spancat` components in the same
pipeline.

* Use the (intentionally very short) default spans key `sc` in the
  `SpanCategorizer`
* Adjust the default score weights to include the default key
* Adjust the scorer to use `spans_{spans_key}` as the prefix for the
  returned score
* Revert addition of `attr_name` argument to `score_spans` and adjust
  the key in the `getter` instead.

Note that for `spancat` components with a custom `span_key`, the score
weights currently need to be modified manually in
`[training.score_weights]` for them to be available during training. To
suppress the default score weights `spans_sc_p/r/f` during training, set
them to `null` in `[training.score_weights]`.

* Update website/docs/api/scorer.md

* Fix scorer for spans key containing underscore

* Increment version

* Add Spans to Evaluate CLI (#8439)

* Add Spans to Evaluate CLI

* Change to spans_key

* Add spans per_type output

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

* Fix spancat GPU issues (#8455)

* Fix GPU issues

* Require thinc >=8.0.6

* Switch to glorot_uniform_init

* Fix and test ngram suggester

* Include final ngram in doc for all sizes
* Fix ngrams for docs of the same length as ngram size
* Handle batches of docs that result in no ngrams
* Add tests

Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Nirant <NirantK@users.noreply.github.com>
2021-06-24 12:35:27 +02:00
Adriane Boyd
172dfec4f2
Test download in CI with ca_core_news_sm (#8493) 2021-06-24 09:26:30 +02:00