René Octavio Queiroz Dias
59271e887a
fix: TransformerListener with TextCatEnsemble ( #6951 )
...
* bug: Regression test
Issue #6946
* fix: Fix issue #6946
* chore: Remove regression test
2021-02-06 13:44:51 +01:00
Matthew Honnibal
ffc371350a
Avoid assuming encode.get_dim('nO') is set in tok2vec ( #6800 )
2021-01-24 14:37:33 +11:00
Sofie Van Landeghem
c8761b0e6e
rewrite Maxout layer as separate layers to avoid shape inference trouble ( #6760 )
2021-01-19 07:37:17 +08:00
Adriane Boyd
26c34ab8b0
Fix parser resizing for cupy ( #6758 )
2021-01-18 20:43:15 +01:00
Matthew Honnibal
c2a18e4fa3
Update textcat ensemble model
2021-01-19 02:53:02 +11:00
Ines Montani
a203e3dbb8
Support spacy-legacy via the registry
2021-01-15 21:42:40 +11:00
Ines Montani
b0b743597c
Tidy up and auto-format
2021-01-15 11:57:36 +11:00
Sofie Van Landeghem
75d9019343
Fix types of Tok2Vec encoding architectures ( #6442 )
...
* fix TorchBiLSTMEncoder documentation
* ensure the types of the encoding Tok2vec layers are correct
* update references from v1 to v2 for the new architectures
2021-01-07 16:39:27 +11:00
Sofie Van Landeghem
3983bc6b1e
Fix Transformer width in TextCatEnsemble ( #6431 )
...
* add convenience method to determine tok2vec width in a model
* fix transformer tok2vec dimensions in TextCatEnsemble architecture
* init function should not be nested to avoid pickle issues
2021-01-06 12:44:04 +01:00
Ines Montani
991669c934
Tidy up and auto-format
2021-01-05 13:41:53 +11:00
Sofie Van Landeghem
282a3b49ea
Fix parser resizing when there is no upper layer ( #6460 )
...
* allow resizing of the parser model even when upper=False
* update from spacy.TransitionBasedParser.v1 to v2
* bugfix
2020-12-18 18:56:57 +08:00
Sofie Van Landeghem
cfc72c2995
Bugfix multi-label textcat reproducibility ( #6481 )
...
* add test for multi-label textcat reproducibility
* remove positive_label
* fix lengths dtype
* fix comments
* remove comment that we should not have forgotten :-)
2020-12-09 06:29:15 +08:00
Sofie Van Landeghem
de108ed3e8
Add specific error when StaticVectors can't read the vectors data ( #6450 )
2020-12-09 06:16:07 +08:00
Sofie Van Landeghem
f98a04434a
pretrain architectures ( #6451 )
...
* define new architectures for the pretraining objective
* add loss function as attr of the omdel
* cleanup
* cleanup
* shorten name
* fix typo
* remove unused error
2020-12-08 14:41:03 +08:00
Sofie Van Landeghem
a0c899a0ff
Fix textcat + transformer architecture ( #6371 )
...
* add pooling to textcat TransformerListener
* maybe_get_dim in case it's null
2020-11-10 20:14:47 +08:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes ( #6263 )
...
* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos
2020-10-18 14:50:41 +02:00
Sofie Van Landeghem
f8a1c1afd6
avoid dropout at runtime ( #6247 )
2020-10-13 14:39:59 +02:00
svlandeg
40276fd3be
update NEL docs after latest refactor
2020-10-12 11:41:27 +02:00
svlandeg
08cb085f6c
Merge remote-tracking branch 'upstream/develop' into fix/various
2020-10-09 17:01:27 +02:00
svlandeg
040c7c0541
fix get_dim calls in build_simple_cnn_text_classifier
2020-10-09 15:40:58 +02:00
svlandeg
853edace37
fix MultiHashEmbed example in documentation
2020-10-09 14:11:06 +02:00
Adriane Boyd
39aabf50ab
Also rename to include_static_vectors in CharEmbed
2020-10-09 11:54:48 +02:00
Matthew Honnibal
cfb9770a94
Fix empty input into StaticVectors layer ( #6211 )
...
* Add test for empty doc(s)
* Fix empty check in staticvectors
* Remove xfail
* Update spacy/ml/staticvectors.py
2020-10-06 14:15:41 +02:00
Ines Montani
1a554bdcb1
Update docs and docstring [ci skip]
2020-10-05 21:55:27 +02:00
Ines Montani
9614e53b02
Tidy up and auto-format
2020-10-05 21:55:18 +02:00
Matthew Honnibal
e50047f1c5
Check lengths match
2020-10-05 20:02:45 +02:00
Matthew Honnibal
cdd2b79b6d
Remove deprecated MultiHashEmbed
2020-10-05 19:58:18 +02:00
Matthew Honnibal
6dcc4a0ba6
Simplify MultiHashEmbed signature
2020-10-05 19:57:45 +02:00
Matthew Honnibal
eb9ba61517
Format
2020-10-05 15:29:49 +02:00
Matthew Honnibal
8ec79ad3fa
Allow configuration of MultiHashEmbed features
...
Update arguments to MultiHashEmbed layer so that the attributes can be
controlled. A kind of tricky scheme is used to allow optional
specification of the rows. I think it's an okay balance between
flexibility and convenience.
2020-10-05 15:22:00 +02:00
Ines Montani
bcd52e5486
Tidy up errors and warnings
2020-10-04 11:16:31 +02:00
Ines Montani
3bc3c05fcc
Tidy up and auto-format
2020-10-03 17:20:18 +02:00
svlandeg
02247cccaf
Merge remote-tracking branch 'upstream/develop' into feature/small-fixes
2020-10-02 20:48:11 +02:00
Matthew Honnibal
6965cdf16d
Fix comment
2020-10-02 17:26:21 +02:00
Ines Montani
af282ae732
Fix import
2020-10-02 01:12:34 +02:00
Ines Montani
e59ecb12c0
Auto-format
2020-10-02 01:12:30 +02:00
Matthew Honnibal
75a1569908
Merge
2020-10-01 23:07:53 +02:00
Matthew Honnibal
300e5a9928
Avoid relying on NORM in default v3 models ( #6176 )
...
* Allow CharacterEmbed to specify feature
* Default to LOWER in character embed
* Update tok2vec
* Use LOWER, not NORM
2020-10-01 23:05:55 +02:00
Matthew Honnibal
b854bca15c
Default to LOWER in character embed
2020-10-01 22:17:58 +02:00
Matthew Honnibal
684a77870b
Allow CharacterEmbed to specify feature
2020-10-01 22:17:26 +02:00
Sofie Van Landeghem
a22215f427
Add FeatureExtractor from Thinc ( #6170 )
...
* move featureextractor from Thinc
* Update website/docs/api/architectures.md
Co-authored-by: Ines Montani <ines@ines.io>
* Update website/docs/api/architectures.md
Co-authored-by: Ines Montani <ines@ines.io>
Co-authored-by: Ines Montani <ines@ines.io>
2020-10-01 16:22:48 +02:00
svlandeg
5121972930
add types of Tok2Vec embedding layers
2020-10-01 09:20:09 +02:00
svlandeg
5a9fdbc8ad
state_type as Literal
2020-09-23 17:32:14 +02:00
svlandeg
25b34bba94
throw custom error when state_type is invalid
2020-09-23 16:57:14 +02:00
svlandeg
dd2292793f
'parser' instead of 'deps' for state_type
2020-09-23 16:53:49 +02:00
svlandeg
6c85fab316
state_type and extra_state_tokens instead of nr_feature_tokens
2020-09-23 13:35:09 +02:00
Ines Montani
1114219ae3
Tidy up and auto-format
2020-09-21 10:59:07 +02:00
Adriane Boyd
f3db3f6fe0
Add vectors option to CharacterEmbed ( #6069 )
...
* Add vectors option to CharacterEmbed
* Update spacy/pipeline/morphologizer.pyx
* Adjust default morphologizer config
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-09-16 17:45:04 +02:00
Ines Montani
1955aaaa20
Merge pull request #6045 from svlandeg/feature/more-layers-docs [ci skip]
2020-09-09 21:46:40 +02:00
Sofie Van Landeghem
cb66ea7400
Remove simple_ner code ( #6041 )
...
* remove simple_ner code
* remove unused _biluo and _iob files
2020-09-09 16:11:27 +02:00