Vito De Tullio
3672464e25
applying suggestion to avoid mypy errors ( #8265 )
...
* applying suggestion to avoid mypy errors
* sign contributor agreement
2021-06-02 19:25:30 +10:00
Adriane Boyd
4aa1a7d5a3
Remove unsupported attrs from attrs.IDS ( #8132 )
...
The attributes `PROB`, `CLUSTER` and `SENT_END` are not supported by
`Lexeme.get_struct_attr` so should not be included through `attrs.IDS`
as supported attributes in `Doc.to_array` and other methods.
2021-06-02 19:16:57 +10:00
Paul O'Leary McCann
d54631f68b
Fix other open calls without context managers ( #8245 )
2021-05-31 19:04:29 +10:00
Paul O'Leary McCann
5aba213349
Fix skweak Github URL
...
Github entry should not contain url, just user/repo
2021-05-31 18:00:43 +09:00
Kristian Boda
0035db4103
Add hmrb to spaCy Universe ( #8129 )
...
* docs: add hmrb to spacy universe
* docs: add sentence on spacy versions
* docs: update description and images
* misc: add spaCy Contributor Agreement
2021-05-31 10:41:34 +02:00
Kristian Boda
dc8d8d15d2
Add hmrb to spaCy Universe ( #8129 )
...
* docs: add hmrb to spacy universe
* docs: add sentence on spacy versions
* docs: update description and images
* misc: add spaCy Contributor Agreement
2021-05-31 18:40:48 +10:00
Dhruv Naik
283f64a98d
Fix bug from Entityruler: ent_ids returns None for phrases ( #8169 )
...
* bugfix for explosion/spaCy#8168
* add test for explosion/spaCy#8168
2021-05-31 18:38:53 +10:00
Michael K
b0467d2972
Add project urls to package metadata ( #7728 )
...
This adds the links to PyPI. To see that in action check out
https://pypi.org/project/Django/ (source code:
b8c9e9fae1/setup.cfg (L27-L32)
)
2021-05-31 18:38:29 +10:00
Narayan Acharya
6b79714080
Address missing config overrides post load of models ( #8208 )
2021-05-31 18:36:52 +10:00
Sofie Van Landeghem
fff662e41f
Ensemble textcat with listener ( #8012 )
...
* add unit test for two listeners, with a textcat ensemble in the middle
* return zero gradients instead of None in accumulate_gradient
2021-05-31 18:21:06 +10:00
Sofie Van Landeghem
ff91e6dac7
Show warning if entity_ruler runs without patterns ( #7807 )
...
* Show warning if entity_ruler runs without patterns
* Show warning if matcher runs without patterns
* fix wording
* unit test for warning once (WIP)
* warn W036 only once
* cleanup
* create filter_warning helper
2021-05-31 18:20:27 +10:00
Paul O'Leary McCann
d1a221a374
Add all symbols in Unicode Currency Symbols block ( #8212 )
...
* Add all symbols in Unicode Currency Symbols block
In #8102 it came up that the rupee symbol was treated different from
dollar / euro / yen symbols. This adds many symbols not already
included.
* Fix test
* Fix training test
2021-05-31 18:03:40 +10:00
Paul O'Leary McCann
04239e94c7
Use a context manager when reading model ( fix #7036 ) ( #8244 )
2021-05-31 17:36:17 +10:00
Sofie Van Landeghem
fc37715cfb
ensure 'spacy ray' works ( #7799 )
...
* ensure 'spacy ray' works
* better fix by changing entry point
2021-05-28 18:15:31 +02:00
svlandeg
0aa1083ce8
avoid repetitive entities in the output
2021-05-28 16:52:51 +02:00
svlandeg
0d81bce9cc
add failing test for too short a sentence
2021-05-28 15:10:35 +02:00
svlandeg
0f5c586e2f
add basic tests for debugging
2021-05-28 14:19:55 +02:00
Ines Montani
5957ab74f7
Merge pull request #8112 from svlandeg/bugfix/replace-trf
2021-05-28 11:35:17 +10:00
svlandeg
391b512afd
fix types of fwd functions
2021-05-27 16:36:46 +02:00
svlandeg
04b55bf054
removing unused imports
2021-05-27 16:31:38 +02:00
svlandeg
910026582d
set versions to v1 instead of v0
2021-05-27 16:17:20 +02:00
svlandeg
2e3c0e2256
delete outdated tests
2021-05-27 13:54:31 +02:00
svlandeg
ba2e491cc4
Merge remote-tracking branch 'upstream/master' into feature/coref
2021-05-27 13:50:32 +02:00
Sofie Van Landeghem
4b81f58eda
fix docs ( #8200 )
2021-05-27 10:50:46 +02:00
Sofie Van Landeghem
3c58c0323f
fix docs ( #8200 )
2021-05-27 10:48:59 +02:00
Sofie Van Landeghem
290bd6ed39
ensure tolerance is properly passed on ( #8158 )
2021-05-27 18:10:28 +10:00
Paul O'Leary McCann
ee62344970
Fix skweak Github URL
...
Github entry should not contain url, just user/repo
2021-05-24 20:31:43 +09:00
Paul O'Leary McCann
68ccfc4c39
Fix docs ( fix #8189 )
2021-05-24 19:49:21 +09:00
Paul O'Leary McCann
0c553ecd4e
Fix docs ( fix #8189 )
2021-05-24 19:47:30 +09:00
Paul O'Leary McCann
a484245f35
Remove references to coref_er
2021-05-24 19:08:45 +09:00
Paul O'Leary McCann
d6389b133d
Don't use a generator for no reason
2021-05-24 19:06:15 +09:00
Paul O'Leary McCann
d6fd5fe1c0
Minor cleanup
2021-05-24 14:56:43 +09:00
Paul O'Leary McCann
0942a0b51b
Remove coref_er.py
...
The intent of this was that it would be a component pipeline that used
entities as input, but that's now covered by the get_mentions function
as a pipeline arg.
2021-05-21 18:20:25 +09:00
Paul O'Leary McCann
f6652c9252
Add new coref scoring
...
This is closer to the traditional evaluation method. That uses an
average of three scores, this is just using the bcubed metric for now
(nothing special about bcubed, just picked one).
The scoring implementation comes from the coval project. It relies on
scipy, which is one issue, and is rather involved, which is another.
Besides being comparable with traditional evaluations, this scoring is
relatively fast.
2021-05-21 15:56:40 +09:00
Paul O'Leary McCann
e1b4a85bb9
Fix loss
...
The loss was being returned as a single element array, which caused
training to die when it attempted to turn it into JSON.
2021-05-21 15:46:50 +09:00
Adriane Boyd
cd6bd91c3a
Switch default train corpus max_length to 0 in quickstart ( #8142 )
...
The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length
!= 0` that `0` is a better default for users creating a new config with
the quickstart.
If not, documents are skipped, sometimes the entire corpus is skipped,
and sometimes documents are (quite unexpectedly for your average user)
split into sentences.
2021-05-20 14:48:09 +02:00
Paul O'Leary McCann
ff3fed06cf
Catch a stray reference
2021-05-20 21:30:46 +09:00
Sofie Van Landeghem
202943bc8c
KB & NEL to/from bytes ( #8113 )
...
* unit test for pickling KB
* add pickling test for NEL
* KB to_bytes and from_bytes
* NEL to_bytes and from_bytes
* xfail pickle tests for now
* fix docs
* cleanup
2021-05-20 18:11:30 +10:00
Paul O'Leary McCann
8c5df622d8
Help out python gc in coref backprop
2021-05-20 16:40:55 +09:00
Paul O'Leary McCann
fa92daf052
Break pairwise operations into pseudolayers
...
This makes their scope tighter and more contained, and has the nice side
effect that fewer things need to be passed around for backprop.
2021-05-20 15:59:51 +09:00
Adriane Boyd
4e69fcaa50
Disable GPU CI tests ( #8143 )
2021-05-19 12:00:31 +02:00
Adriane Boyd
f6128c06b0
Disable GPU CI tests ( #8143 )
2021-05-19 12:00:07 +02:00
Paul O'Leary McCann
d22acee4f7
Fix backprop
...
Training seems to actually run now!
2021-05-18 20:09:27 +09:00
Paul O'Leary McCann
2486b8ad4d
Fix pipeline intialize
2021-05-18 19:56:27 +09:00
Paul O'Leary McCann
0620820857
Deal with generators in tuplify
2021-05-18 19:55:52 +09:00
Paul O'Leary McCann
a7d9c8156d
Make get_sentence_map work with init
...
When sentences are not available, just treat the whole doc as one
sentence. A reasonable general fallback, but important due to the init
call, where upstream components aren't run.
2021-05-18 19:54:54 +09:00
Paul O'Leary McCann
883c137b26
Add basic tuplify init
2021-05-18 19:53:59 +09:00
Paul O'Leary McCann
051715506e
Fiddle with get_mentions definition
...
Ended up not making a difference, but oh well.
2021-05-18 19:53:33 +09:00
Adriane Boyd
06324e5a5e
Update pydantic requirements ( #8127 )
...
Update pydantic requirements following
https://github.com/explosion/thinc/pull/499
2021-05-18 11:35:50 +02:00
Paul O'Leary McCann
a33d29441a
Merge remote-tracking branch 'upstream/develop' into feature/coref
2021-05-18 17:00:17 +09:00