Ines Montani
|
0a8a124a6e
|
Update docs [ci skip]
|
2020-10-01 12:15:53 +02:00 |
|
Ines Montani
|
44160cd52f
|
Tidy up [ci skip]
|
2020-10-01 10:41:19 +02:00 |
|
Ines Montani
|
381258b75b
|
Merge pull request #6165 from explosion/feature/update-tokenizers-initialize
|
2020-10-01 09:49:47 +02:00 |
|
svlandeg
|
6787e56315
|
print debugging warning before raising error if model not properly initialized
|
2020-10-01 09:21:00 +02:00 |
|
svlandeg
|
5121972930
|
add types of Tok2Vec embedding layers
|
2020-10-01 09:20:09 +02:00 |
|
Ines Montani
|
4b6afd3611
|
Remove English [initialize] default block for now to get tests to pass
|
2020-09-30 23:49:29 +02:00 |
|
Ines Montani
|
6f29f68f69
|
Update errors and make Tokenizer.initialize args less strict
|
2020-09-30 23:48:47 +02:00 |
|
Ines Montani
|
a103ab5f1a
|
Update augmenter lookups and docs
|
2020-09-30 23:03:47 +02:00 |
|
Matthew Honnibal
|
5128298964
|
Add missing augmenter
|
2020-09-30 20:18:45 +02:00 |
|
Matthew Honnibal
|
59294e91aa
|
Restore the 'jsonl' arg for init vectors
The lexemes.jsonl file is still used in our English vectors, and it may
be required by users as well. I think it's worth supporting the option.
|
2020-09-30 19:06:50 +02:00 |
|
Matthew Honnibal
|
c379a4274a
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2020-09-30 16:52:42 +02:00 |
|
Matthew Honnibal
|
e58dca3028
|
Add read_labels
|
2020-09-30 16:52:27 +02:00 |
|
Ines Montani
|
23c63eefaf
|
Tidy up env vars [ci skip]
|
2020-09-30 15:15:11 +02:00 |
|
Adriane Boyd
|
6b7bb32834
|
Refactor Chinese initialization
|
2020-09-30 11:46:45 +02:00 |
|
Ines Montani
|
34f9c26c62
|
Add lexeme norm defaults
|
2020-09-30 10:20:14 +02:00 |
|
Ines Montani
|
a5debb356d
|
Tidy up and adjust logging [ci skip]
|
2020-09-30 01:22:08 +02:00 |
|
Ines Montani
|
56a2f778c4
|
Add logging [ci skip]
|
2020-09-30 01:08:55 +02:00 |
|
Ines Montani
|
fe3f111c37
|
Merge pull request #6168 from explosion/fix/default-corpus-values
|
2020-09-30 00:24:02 +02:00 |
|
Ines Montani
|
b799af16de
|
Don't raise in Pipe.initialize if not implemented
|
2020-09-30 00:05:27 +02:00 |
|
Matthew Honnibal
|
bc61691f6f
|
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
|
2020-09-29 23:41:04 +02:00 |
|
Matthew Honnibal
|
f52249fe2e
|
Fix data augmentation
|
2020-09-29 23:40:54 +02:00 |
|
Matthew Honnibal
|
14c4da547f
|
Try to fix augmentation
|
2020-09-29 23:08:56 +02:00 |
|
Ines Montani
|
ae51843468
|
Remove augmenter from jinja template [ci skip]
|
2020-09-29 23:08:50 +02:00 |
|
Ines Montani
|
9bb958fd0a
|
Fix debug data [ci skip]
|
2020-09-29 23:07:11 +02:00 |
|
Matthew Honnibal
|
a2aa1f6882
|
Disable the OVL augmentation by default
|
2020-09-29 23:02:40 +02:00 |
|
Ines Montani
|
df8dd91b6f
|
Merge branch 'develop' into fix/default-corpus-values
|
2020-09-29 22:55:39 +02:00 |
|
Ines Montani
|
0a1ee109db
|
Remove init form path
|
2020-09-29 22:53:18 +02:00 |
|
Ines Montani
|
ad6d40d028
|
Add logging
|
2020-09-29 22:53:14 +02:00 |
|
Ines Montani
|
c334a7d45f
|
Remove
|
2020-09-29 22:38:39 +02:00 |
|
Ines Montani
|
1aeef3bfbb
|
Make corpus paths default to None and improve errors
|
2020-09-29 22:33:46 +02:00 |
|
Ines Montani
|
0250bcf6a3
|
Show validation error during init
|
2020-09-29 22:29:09 +02:00 |
|
Ines Montani
|
da30bae8a6
|
Use __pyx_vtable__ instead of __reduce_cython__
|
2020-09-29 22:04:17 +02:00 |
|
Ines Montani
|
43c92ec8c9
|
Resolve dir for better output [ci skip]
|
2020-09-29 22:01:04 +02:00 |
|
Ines Montani
|
fa47f87924
|
Tidy up and auto-format
|
2020-09-29 21:39:28 +02:00 |
|
Ines Montani
|
604be54a5c
|
Support --code in evaluate CLI [ci skip]
|
2020-09-29 21:20:56 +02:00 |
|
Ines Montani
|
6467a560e3
|
WIP: Test updating Chinese tokenizer
|
2020-09-29 21:10:22 +02:00 |
|
Ines Montani
|
4f3102d09c
|
Auto-format
|
2020-09-29 21:09:10 +02:00 |
|
Ines Montani
|
798040bc1d
|
Fix language detection
|
2020-09-29 21:08:13 +02:00 |
|
Ines Montani
|
78021089f9
|
Merge pull request #6160 from explosion/feature/prepare
|
2020-09-29 20:55:13 +02:00 |
|
Ines Montani
|
c3f8c09d7d
|
Merge pull request #6154 from adrianeboyd/bugfix/chinese-tokenizer-pickle
|
2020-09-29 20:54:59 +02:00 |
|
Ines Montani
|
d3c63b7965
|
Merge branch 'develop' into feature/prepare
|
2020-09-29 20:53:05 +02:00 |
|
Ines Montani
|
2be80379ec
|
Fix small issues, resolve_dot_names and debug model
|
2020-09-29 20:38:35 +02:00 |
|
Matthew Honnibal
|
a4da3120b4
|
Fix multitasks
|
2020-09-29 18:33:16 +02:00 |
|
Matthew Honnibal
|
0b5c72fce2
|
Fix incorrect docstrings
|
2020-09-29 18:30:38 +02:00 |
|
Ines Montani
|
7851020653
|
Update tests
|
2020-09-29 18:14:15 +02:00 |
|
Ines Montani
|
71a0ee274a
|
Move init labels to init pipeline module
|
2020-09-29 18:09:33 +02:00 |
|
Ines Montani
|
dba26186ef
|
Handle None default args in Cython methods
|
2020-09-29 18:08:02 +02:00 |
|
Ines Montani
|
9353a82076
|
Auto-format
|
2020-09-29 18:07:48 +02:00 |
|
Ines Montani
|
534e1ef498
|
Fix template
|
2020-09-29 17:02:55 +02:00 |
|
Ines Montani
|
f2352eb701
|
Test with default value
|
2020-09-29 17:00:40 +02:00 |
|
Matthew Honnibal
|
8ce9f44433
|
Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare
|
2020-09-29 16:57:38 +02:00 |
|
Matthew Honnibal
|
e4f535a964
|
Fix Pipe.labels
|
2020-09-29 16:55:07 +02:00 |
|
Matthew Honnibal
|
4ad26f4a2f
|
Move reader
|
2020-09-29 16:54:53 +02:00 |
|
Ines Montani
|
30c76dbd67
|
Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare
|
2020-09-29 16:53:48 +02:00 |
|
Matthew Honnibal
|
43fc7a316d
|
Add registry function for reading jsonl
|
2020-09-29 16:49:09 +02:00 |
|
Matthew Honnibal
|
1fd002180e
|
Allow more components to use labels
|
2020-09-29 16:48:56 +02:00 |
|
Matthew Honnibal
|
99bff78617
|
Use labels in tagger
|
2020-09-29 16:48:44 +02:00 |
|
Matthew Honnibal
|
ca72608059
|
Fix language
|
2020-09-29 16:48:33 +02:00 |
|
Matthew Honnibal
|
10847c7f4e
|
Fix arg
|
2020-09-29 16:48:07 +02:00 |
|
Ines Montani
|
fd594cfb9b
|
Tighten up format
|
2020-09-29 16:47:55 +02:00 |
|
Matthew Honnibal
|
e70a00fa76
|
Remove unnecessary warning from train
|
2020-09-29 16:47:54 +02:00 |
|
Matthew Honnibal
|
3f0d61232d
|
Remove outdated arg from train
|
2020-09-29 16:47:44 +02:00 |
|
Matthew Honnibal
|
e957d66b92
|
Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare
|
2020-09-29 16:22:53 +02:00 |
|
Ines Montani
|
978ab54a84
|
Fix logging
|
2020-09-29 16:22:41 +02:00 |
|
Matthew Honnibal
|
45daf5c9fe
|
Add init labels command
|
2020-09-29 16:22:37 +02:00 |
|
Matthew Honnibal
|
58c8d4b414
|
Add label_data property to pipeline
|
2020-09-29 16:22:13 +02:00 |
|
Ines Montani
|
aa2a6882d0
|
Fix logging
|
2020-09-29 16:08:39 +02:00 |
|
Ines Montani
|
63d1598137
|
Simplify config use in Language.initialize
|
2020-09-29 16:05:48 +02:00 |
|
Ines Montani
|
56f8bc73ef
|
Add more tests
|
2020-09-29 15:23:34 +02:00 |
|
Sofie Van Landeghem
|
6a04e5adea
|
encoding UTF8 (#6161)
|
2020-09-29 14:49:55 +02:00 |
|
Ines Montani
|
591038b1a4
|
Add test
|
2020-09-29 12:54:52 +02:00 |
|
Ines Montani
|
adca08a12f
|
Pass nlp forward
|
2020-09-29 12:21:52 +02:00 |
|
Ines Montani
|
f171903139
|
Clean up sgd and pipeline -> nlp
|
2020-09-29 12:20:26 +02:00 |
|
Ines Montani
|
612bbf85ab
|
Update initialize.py
|
2020-09-29 12:14:47 +02:00 |
|
Ines Montani
|
42f0e4c946
|
Clean up
|
2020-09-29 12:14:08 +02:00 |
|
Matthew Honnibal
|
9c8b2524fe
|
Upd initialize args
|
2020-09-29 12:08:37 +02:00 |
|
Matthew Honnibal
|
e1fdf2b7c5
|
Upd tests
|
2020-09-29 12:05:38 +02:00 |
|
Ines Montani
|
50410c17ac
|
Update schemas.py
|
2020-09-29 12:05:38 +02:00 |
|
Matthew Honnibal
|
f2d1b7feb5
|
Clean up sgd
|
2020-09-29 12:00:08 +02:00 |
|
Ines Montani
|
78396d137f
|
Integrate initialize settings
|
2020-09-29 11:57:08 +02:00 |
|
Ines Montani
|
dec984a9c1
|
Update Language.initialize and support components/tokenizer settings
|
2020-09-29 11:52:45 +02:00 |
|
Matthew Honnibal
|
b3b6868639
|
Remove 'sgd' arg from component initialize
|
2020-09-29 11:42:35 +02:00 |
|
Matthew Honnibal
|
5276db6f3f
|
Remove 'device' argument from Language, clean up 'sgd' arg
|
2020-09-29 11:42:19 +02:00 |
|
Ines Montani
|
4925ad760a
|
Add init vectors
|
2020-09-29 10:58:50 +02:00 |
|
svlandeg
|
64d90039a1
|
encoding UTF8
|
2020-09-29 10:54:42 +02:00 |
|
Ines Montani
|
ff9a63bfbd
|
begin_training -> initialize
|
2020-09-28 21:35:09 +02:00 |
|
Ines Montani
|
046f655d86
|
Fix error
|
2020-09-28 21:17:45 +02:00 |
|
Ines Montani
|
a139fe672b
|
Fix typos and refactor CLI logging
|
2020-09-28 21:17:10 +02:00 |
|
Ines Montani
|
2e9c9e74af
|
Fix config resolution and interpolation
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
|
2020-09-28 15:34:00 +02:00 |
|
Ines Montani
|
02838a1d47
|
Fix resolve_dot_names
|
2020-09-28 15:27:10 +02:00 |
|
Ines Montani
|
822ea4ef61
|
Refactor CLI
|
2020-09-28 15:09:59 +02:00 |
|
Ines Montani
|
a89e0ff7cb
|
Fix typo
|
2020-09-28 12:55:21 +02:00 |
|
Ines Montani
|
a62337b3f3
|
Tidy up vocab init
|
2020-09-28 12:53:06 +02:00 |
|
Ines Montani
|
c22ecc66bb
|
Don't support init path for now
|
2020-09-28 12:46:28 +02:00 |
|
Ines Montani
|
f49288ab81
|
Update default_config_pretraining.cfg
|
2020-09-28 12:31:54 +02:00 |
|
Ines Montani
|
a5f2cc0509
|
Tidy up and remove raw text (rehearsal) for now
|
2020-09-28 12:30:13 +02:00 |
|
Ines Montani
|
1590de11b1
|
Update config
|
2020-09-28 12:05:23 +02:00 |
|
Matthew Honnibal
|
9f6ad06452
|
Upd default config
|
2020-09-28 12:00:23 +02:00 |
|
Ines Montani
|
e44a7519cd
|
Update CLI and add [initialize] block
|
2020-09-28 11:56:14 +02:00 |
|
Ines Montani
|
d5155376fd
|
Update vocab init
|
2020-09-28 11:30:18 +02:00 |
|