Paul O'Leary McCann
4a9dc00d86
Use relative indices for mentions
...
Was using batch absolute indices to manage mentions, but extract_spans
expects doc-relative ones.
2021-07-14 18:36:18 +09:00
Paul O'Leary McCann
3684f7fdfd
Remove comment from fixed test
2021-07-14 18:22:14 +09:00
Paul O'Leary McCann
f1796e4af7
Fix mention list bug
...
There was an off-by-one error in how mentions are generated that would
affect mentions at the end of a sentence. This was pretty nasty.
2021-07-14 18:19:00 +09:00
Paul O'Leary McCann
80a17071d3
Remove unused code
2021-07-11 18:46:39 +09:00
Paul O'Leary McCann
447c7070e3
Fix loss
...
Accidentally deleted it
2021-07-10 22:45:25 +09:00
Paul O'Leary McCann
c25ec292a9
Cleanup
2021-07-10 22:42:55 +09:00
Paul O'Leary McCann
e00bd422d9
Fix span embeds
...
Some of the lengths and backprop weren't right.
Also various cleanup.
2021-07-10 21:38:53 +09:00
Paul O'Leary McCann
d7d317a1b5
Clean up span embedding code
...
This is now cleaner and significantly faster. There's still some messy
parts in the code (particularly variable names), will get to that later.
2021-07-10 19:59:08 +09:00
Paul O'Leary McCann
dc1f974d39
Merge branch 'master' into feature/coref
2021-07-10 18:10:40 +09:00
Paul O'Leary McCann
f34915c1e8
Use scatter_add to speed up span embed backprop
...
This was the slowest part of the code, and using scatter_add here
probably reduces the runtime by 50%.
2021-07-10 18:08:51 +09:00
Ines Montani
616f4de034
Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip]
...
Make the autoblack job not run on forks
2021-07-10 16:41:59 +10:00
Paul O'Leary McCann
b8cdbb4bb6
Make the autoblack job not run on forks
...
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.
The way to make the job not run on forks is a little non-obvious but
based on this thread.
https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani
d4fecdfb82
Merge pull request #8665 from rynoV/patch-1 [ci skip]
2021-07-10 10:52:15 +10:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip]
2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes
2021-07-09 10:25:56 -06:00
Adriane Boyd
d8805a1073
Fix ru/uk lemmatizer mp with spawn ( #8657 )
...
Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.
2021-07-09 15:36:56 +02:00
Adriane Boyd
b8e720fdb9
Fix Azerbaijani init, extend lang init tests ( #8656 )
...
* Extend langs in initialize tests
* Fix az init
2021-07-09 15:36:35 +02:00
Ines Montani
1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit
2021-07-09 23:09:24 +10:00
Ines Montani
bbca56687f
Merge pull request #8655 from explosion/autoblack
...
Auto-format code with black
2021-07-09 23:08:05 +10:00
explosion-bot
334f1f98d8
Auto-format code with black
2021-07-09 08:06:06 +00:00
Adriane Boyd
1ee5bee29d
Add Macedonian models to website ( #8637 )
2021-07-08 09:32:14 +02:00
Paul O'Leary McCann
d0b041aff4
Switch to using Thinc tuplify
...
The tuplify code here was added to Thinc proper and that's been
released, so no need to have it here any more.
2021-07-08 16:08:36 +09:00
Paul O'Leary McCann
1d9209d43a
Merge pull request #8547 from mylibrar/update-universe
...
Add forte to universe.json
2021-07-08 14:59:49 +09:00
Ines Montani
39c8f7949e
Add code preview for textcat_multilabel [ci skip]
2021-07-08 13:33:25 +10:00
Ines Montani
bcd2be40b5
Merge pull request #8634 from rynoV/patch-1 [ci skip]
2021-07-08 12:52:59 +10:00
Calum Sieppert
889c187bc2
Typo fixes
2021-07-07 16:53:04 -06:00
julien-talkair
833f7f2918
👷 configure flake8 pre-commit
...
* uses setup.cfg for flake8 configuration during pre-commit
2021-07-07 21:31:46 +02:00
Ines Montani
530b5d72f6
Merge pull request #8624 from adrianeboyd/docs/v3-1-usage-updates [ci skip]
...
Update v3.1 usage docs
2021-07-07 16:50:36 +10:00
Adriane Boyd
6db647dfe0
Update v3.1 usage docs
2021-07-07 08:43:33 +02:00
Sofie Van Landeghem
64fac754fe
add spacy prefix to ngram_suggester.v1 ( #8623 )
2021-07-07 08:09:30 +02:00
julien-talkair
82b01964fa
🚨 adjust flake8 sensitivity
...
* pass arguments to flake8
* reproduce arguments from CI config
2021-07-06 22:41:54 +02:00
Sofie Van Landeghem
733e8ceea9
fix spancat initialize with labels ( #8620 )
2021-07-06 19:08:25 +02:00
Sofie Van Landeghem
608fc1d623
avoid msg var impliciteness ( #8619 )
...
* avoid msg var impliciteness
* rename local msg
* Add CI tests for debug data and train
* Adjust debug data CLI test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 19:08:08 +02:00
Sofie Van Landeghem
e7d747e3ee
TransitionBasedParser.v1 to legacy ( #8586 )
...
* TransitionBasedParser.v1 to legacy
* register sublayers
* bump spacy-legacy to 3.0.7
2021-07-06 15:26:45 +02:00
Ines Montani
04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip]
2021-07-06 22:20:24 +10:00
Luca Dorigo
e8ef4a46d5
Add the right return type for Language.pipe and an overload for the as_tuples case ( #8441 )
...
* Add the right return type for Language.pipe and an overload for the as_tuples version
* Reformat, tidy up
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 14:18:40 +02:00
Sofie Van Landeghem
b9f59118bf
Fix silent evaluation ( #8581 )
...
* fix silentness
* sneak in docs typo fix
* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Sofie Van Landeghem
3daf57d70c
Small spancat fixes ( #8614 )
...
* two small fixes + additional tests
* rename
2021-07-06 14:15:41 +02:00
Ines Montani
327f83573a
Move scores per type handling into util function ( #8590 )
2021-07-06 13:02:37 +02:00
Adriane Boyd
5fd0b5207e
Fix vectors check for sourced components ( #8559 )
...
* Fix vectors check for sourced components
Since vectors are not loaded when components are sourced, store a hash
for the vectors of each sourced component and compare it to the loaded
vectors after the vectors are loaded from the `[initialize]` block.
* Pop temporary info
* Remove stored hash in remove_pipe
* Add default for pop
* Add additional convert/debug/assemble CLI tests
2021-07-06 12:43:17 +02:00
Adriane Boyd
29906884c5
Raise an error for textcat with <2 labels ( #8584 )
...
* Raise an error for textcat with <2 labels
Raise an error if initializing a `textcat` component without at least
two labels.
* Add similar note to docs
* Update positive_label description in API docs
2021-07-06 12:35:22 +02:00
Ines Montani
5bb7fe4b41
Update with HF hub integration [ci skip]
2021-07-06 19:30:59 +10:00
Paul O'Leary McCann
3b1d5350d0
Merge pull request #8609 from mathcass/model-documentation-typo
...
Fix a command typo in models.md
2021-07-06 14:43:58 +09:00
Cass
7d13fc799b
Fix a command typo in models.md
...
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00
Paul O'Leary McCann
eb5820b593
Improve take_vecs implementation
...
This pulls out references to needed bits so that other parts (the larger
embeddings) can be freed before backprop.
2021-07-05 21:08:42 +09:00
Paul O'Leary McCann
13bef2ddb6
Add width prior feature
...
Not necessary for convergence, but in coref-hoi this seems to add a few
f1 points.
Note that there are two width-related features in coref-hoi. This is a
"prior" that is added to mention scores. The other width related feature
is appended to the span embedding representation for other layers to
reference.
2021-07-05 21:06:28 +09:00
Ines Montani
8423864b50
Add docs notes on installing models from Python and in Jupyter [ci skip] ( #8597 )
2021-07-05 13:49:20 +02:00
Paul O'Leary McCann
8f66176b2d
Fix loss?
...
This rewrites the loss to not use the Thinc crossentropy code at all.
The main difference here is that the negative predictions are being
masked out (= marginalized over), but negative gradient is still being
reflected.
I'm still not sure this is exactly right but models seem to train
reliably now.
2021-07-05 18:17:10 +09:00
Ines Montani
15108cd930
Merge pull request #8593 from yohasebe/patch-1 [ci skip]
2021-07-05 11:31:38 +10:00
Ines Montani
fdcd4003e5
Merge pull request #8592 from yohasebe/patch-2 [ci skip]
...
Adds contributor agreement yohasebe.md
2021-07-05 11:27:43 +10:00