svlandeg
a48553c1ed
fix error numbers
2020-06-15 08:51:31 +02:00
Matthew Honnibal
3c0fc10dc4
Remove beam for now (maybe)
...
Remove beam_utils
Update setup.py
Remove beam
2020-06-14 19:53:29 +02:00
Matthew Honnibal
98ca14f577
Remove GoldParse
...
WIP on removing goldparse
Get ArcEager compiling after GoldParse excise
Update setup.py
Get spacy.syntax compiling after removing GoldParse
Rename NewExample -> Example and clean up
Clean html files
Start updating tests
Update Morphologizer
2020-06-14 19:53:30 +02:00
Matthew Honnibal
d53723aa4f
Merge from whatif/arrow
2020-06-14 17:43:59 +02:00
Matthew Honnibal
380cce9d8b
Update errors
2020-06-14 17:40:05 +02:00
Matthew Honnibal
706e652820
Merge from develop
2020-06-14 17:35:01 +02:00
Matthew Honnibal
9296d71a54
More GoldParse excise
2020-06-14 17:26:54 +02:00
Matthew Honnibal
60d4e5a9e0
WIP on updating transition-system
2020-06-14 17:22:14 +02:00
Matthew Honnibal
7d65615625
WIP start excising GoldParse
2020-06-14 17:11:41 +02:00
Matthew Honnibal
4362ec7084
Hack Language.evaluate
2020-06-13 23:37:42 +02:00
Matthew Honnibal
7de997c0a5
Update test
2020-06-13 23:11:45 +02:00
Matthew Honnibal
8f941ef527
Update GoldParse
2020-06-13 23:11:29 +02:00
Matthew Honnibal
3a0bbcfb4c
Add biluo_tags_from_doc function
2020-06-13 23:10:54 +02:00
Matthew Honnibal
caa7508725
Draft missing NewExample stuff
2020-06-13 23:10:21 +02:00
Matthew Honnibal
3eb8f3867e
Update test
2020-06-13 23:05:16 +02:00
Matthew Honnibal
5564314d32
Suggest approach for GoldParse
2020-06-13 15:43:35 +02:00
Matthew Honnibal
b078b05ecd
Handle various data better in NewExample
2020-06-13 15:30:12 +02:00
svlandeg
face0de74f
fix MORPH conversion + enable unit test
2020-06-12 16:29:09 +02:00
svlandeg
a5ee082da1
cats bugfix
2020-06-12 15:49:38 +02:00
svlandeg
880dccf93e
entities on doc_annotation, parse links and check their offsets against the entities. unit test works
2020-06-12 15:47:20 +02:00
svlandeg
3aed177a35
fix ENT_IOB conversion and enable unit test
2020-06-12 11:30:24 +02:00
Matthew Honnibal
a1c5b694be
Small fixes to train defaults
2020-06-12 02:22:13 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default ( #5575 )
...
* verbose and tag_map options
* adding init_tok2vec option and only changing the tok2vec that is specified
* adding omit_extra_lookups and verifying textcat config
* wip
* pretrain bugfix
* add replace and resume options
* train_textcat fix
* raw text functionality
* improve UX when KeyError or when input data can't be parsed
* avoid unnecessary access to goldparse in TextCat pipe
* save performance information in nlp.meta
* add noise_level to config
* move nn_parser's defaults to config file
* multitask in config - doesn't work yet
* scorer offering both F and AUC options, need to be specified in config
* add textcat verification code from old train script
* small fixes to config files
* clean up
* set default config for ner/parser to allow create_pipe to work as before
* two more test fixes
* small fixes
* cleanup
* fix NER pickling + additional unit test
* create_pipe as before
2020-06-12 02:02:07 +02:00
svlandeg
6a67a11682
adding tests for new example class (some still failing - WIP)
2020-06-11 17:43:40 +02:00
Matthew Honnibal
488727aee0
Start updating test
2020-06-09 23:58:28 +02:00
Matthew Honnibal
337d2b5ad6
Fix sent start in NewExample
2020-06-09 23:58:16 +02:00
Matthew Honnibal
ad547a4b8f
Refactor towards new Example class
2020-06-09 23:39:46 +02:00
Matthew Honnibal
82810b9846
Update morphologizer
2020-06-09 23:32:07 +02:00
Matthew Honnibal
af1b5f129b
Use new example class in GoldCorpus
2020-06-09 23:31:19 +02:00
Matthew Honnibal
0714f1fa5c
Remove the 'pass example into __call__' thing
2020-06-09 23:30:06 +02:00
Matthew Honnibal
b3868cd1f8
Update NewExample
2020-06-09 23:06:48 +02:00
Matthew Honnibal
ccd332a9fc
Update test stubs
2020-06-09 15:49:04 +02:00
Matthew Honnibal
04569c0b3e
Fix import
2020-06-09 15:44:08 +02:00
Matthew Honnibal
f4caaa8ad9
Update alignment
2020-06-09 15:43:57 +02:00
Matthew Honnibal
b5ef397639
Add header for align.pxd
2020-06-09 15:43:48 +02:00
Matthew Honnibal
793092d2d8
Fix renaming in GoldCorpus
2020-06-09 15:43:38 +02:00
Matthew Honnibal
36d49a0f13
Fix NewExample class
2020-06-09 15:43:19 +02:00
Matthew Honnibal
f1189dc205
Draft tests for new Example class
2020-06-09 15:43:08 +02:00
Matthew Honnibal
c833ebe1ad
Start tests for new example class
2020-06-09 15:29:05 +02:00
Matthew Honnibal
453cfa14d0
Start drafting new example class
2020-06-09 15:28:42 +02:00
Matthew Honnibal
449000c234
Fix gold_io
2020-06-09 12:43:53 +02:00
Matthew Honnibal
cb08ce3936
Move alignment into Cython
2020-06-09 12:40:41 +02:00
Matthew Honnibal
20a1bdb298
Fix train
2020-06-09 12:33:29 +02:00
Matthew Honnibal
549164c31c
Fix corpus when no raw text supplied
2020-06-09 12:33:14 +02:00
Matthew Honnibal
d9289712ba
* Make GoldCorpus return dict, not Example
...
* Make Example require a Doc object (previously optional)
Clarify methods in GoldCorpus
WIP refactor Example
Refactor Example.split_sents
Fix test
Fix augment
Update test
Update test
Fix import
Update test_scorer
Update Example
2020-06-09 01:01:59 +02:00
Matthew Honnibal
084271c9e9
Remove GoldParse from public API
...
* Move get_parses_from_example to spacy.syntax
* Get GoldParse out of Example
* Avoid expecting GoldParse input in parser
* Add Alignment to spacy.gold.align
* Update Example object
* Add comment
* Update pipeline
* Fix imports
* Simplify gold_io
* WIP on GoldCorpus
* Update test
* Xfail some gold tests
* Remove ignore_misaligned option from GoldCorpus
* Fix Example constructor
* Update test
* Fix usage of Example
* Add deprecated_get_gold method on Example
* Patch scorer
* Fix test
* Fix test
* Update tests
* Xfail a test
* Fix passing of make_projective
* Pass make_projective by default
* Hack data format in Example.from_dict
* Update tests
* Fix example.from_dict
* Update morphologizer
* Fix entity linker
* Add get_field to TokenAnnotation
* Fix Example.get_aligned
* Update test
* Fix alignment
* Fix corpus
* Fix GoldCorpus
* Handle misaligned
* Format
* Fix missing import
2020-06-08 22:09:57 +02:00
Matthew Honnibal
b69fa77ccc
Add missing inits
2020-06-06 15:38:46 +02:00
Matthew Honnibal
6e87ca1f45
Fix imports
2020-06-06 15:36:58 +02:00
Matthew Honnibal
53b00991fd
Fix imports
2020-06-06 15:36:46 +02:00
Matthew Honnibal
74204116a3
Rename _gold -> gold
2020-06-06 15:29:32 +02:00
Matthew Honnibal
7f135736f4
Fix imports
2020-06-06 15:28:52 +02:00
Matthew Honnibal
17533a9286
Format
2020-06-06 15:13:07 +02:00
Matthew Honnibal
0f9b4bbfea
Fix imports
2020-06-06 15:12:52 +02:00
Matthew Honnibal
866179350b
Fix import
2020-06-06 15:11:13 +02:00
Matthew Honnibal
3baa1ada03
Refactr spacy.gold
2020-06-06 15:10:33 +02:00
Matthew Honnibal
1d2e39d974
Support to_dict in Doc
2020-06-06 15:10:10 +02:00
Matthew Honnibal
7b873ce2b1
Move GoldParse under spacy.syntax
2020-06-06 15:09:43 +02:00
Matthew Honnibal
32c8fb1372
Add gold_io.pyx
2020-06-06 14:41:49 +02:00
Matthew Honnibal
156466ca69
Add iob_utils
2020-06-06 14:39:14 +02:00
Matthew Honnibal
53e6473e24
Add to/from dict helpers
2020-06-06 14:29:06 +02:00
Matthew Honnibal
a663d44b1b
Add GoldCorpus
2020-06-06 14:28:37 +02:00
Matthew Honnibal
1fb8fc6ea9
Add Example class
2020-06-06 14:24:35 +02:00
Matthew Honnibal
cce6a51a9c
Add annotation classes
2020-06-06 14:22:27 +02:00
Matthew Honnibal
6005b94e74
Add data augmentation
2020-06-06 14:19:06 +02:00
Matthew Honnibal
fcb4f7a6db
Start breaking down gold.pyx
2020-06-06 14:15:12 +02:00
Ines Montani
d93cbeb14f
Add warning for loose version constraints ( #5536 )
...
* Add warning for loose version constraints
* Update wording [ci skip]
* Tweak error message
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-06-05 12:42:15 +02:00
Matthew Honnibal
8411d4f4e6
Merge pull request #5543 from svlandeg/feature/pretrain-config
...
pretrain from config
2020-06-04 19:07:12 +02:00
svlandeg
3ade455fd3
formatting
2020-06-04 16:09:55 +02:00
svlandeg
776d4f1190
cleanup
2020-06-04 16:07:30 +02:00
svlandeg
6b027d7689
remove duplicate model definition of tok2vec layer
2020-06-04 15:49:23 +02:00
svlandeg
1775f54a26
small little fixes
2020-06-03 22:17:02 +02:00
svlandeg
07886a3de3
rename init_tok2vec to resume
2020-06-03 22:00:25 +02:00
svlandeg
4ed6278663
small fixes to pretrain config, init_tok2vec TODO
2020-06-03 19:32:40 +02:00
svlandeg
ffe0451d09
pretrain from config
2020-06-03 14:45:00 +02:00
Ines Montani
a8875d4a4b
Fix typo
2020-06-03 14:42:39 +02:00
Ines Montani
4e0610d0d4
Update warning codes
2020-06-03 14:37:09 +02:00
Ines Montani
810fce3bb1
Merge branch 'develop' into master-tmp
2020-06-03 14:36:59 +02:00
Adriane Boyd
b0ee76264b
Remove debugging
2020-06-03 14:20:42 +02:00
Adriane Boyd
1d8168d1fd
Fix problems with lower and whitespace in variants
...
Port relevant changes from #5361 :
* Initialize lower flag explicitly
* Handle whitespace words from GoldParse correctly when creating raw
text with orth variants
2020-06-03 14:15:58 +02:00
Adriane Boyd
10d938f221
Update default cfg dir in train CLI
2020-06-03 14:15:50 +02:00
Adriane Boyd
f1f9c8b417
Port train CLI updates
...
Updates from #5362 and fix from #5387 :
* `train`:
* if training on GPU, only run evaluation/timing on CPU in the first
iteration
* if training is aborted, exit with a non-0 exit status
2020-06-03 14:03:43 +02:00
svlandeg
eac12cbb77
make dropout in embed layers configurable
2020-06-03 11:50:16 +02:00
svlandeg
e91485dfc4
add discard_oversize parameter, move optimizer to training subsection
2020-06-03 10:04:16 +02:00
svlandeg
03c58b488c
prevent infinite loop, custom warning
2020-06-03 10:00:21 +02:00
svlandeg
6504b7f161
Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config
2020-06-03 08:30:16 +02:00
svlandeg
c5ac382f0a
fix name clash
2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf
additional test with discard_oversize=False
2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c
extending algorithm to deal better with edge cases
2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60
it's only oversized if the tolerance level is also exceeded
2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7
fix comments
2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3
slightly more challenging unit test
2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c
using overflow buffer for examples within the tolerance margin
2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5
add test for minibatch util
2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99
bugfix of the bugfix
2020-06-02 17:49:33 +02:00
svlandeg
fdfd822936
rewrite minibatch_by_words function
2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886
add oversize examples before StopIteration returns
2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1
remove Tensorizer
2020-06-01 23:38:48 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps
2020-05-31 12:54:01 +02:00
Ines Montani
dc186afdc5
Add warning
2020-05-30 15:34:54 +02:00
Ines Montani
b7aff6020c
Make functions more general purpose and update docstrings and tests
2020-05-30 15:18:53 +02:00