svlandeg
0f123af35e
ensure test keeps working with non-linked entities
2020-06-17 21:13:38 +02:00
svlandeg
6d73e139b0
fix entity linker
2020-06-17 21:12:25 +02:00
svlandeg
be5934b827
fix tagger
2020-06-17 19:42:11 +02:00
svlandeg
10d396977e
add support for MORPH in to/from_array, fix morphologizer overfitting test
2020-06-17 17:48:07 +02:00
svlandeg
1a151b10d6
correct silly typo
2020-06-17 14:48:14 +02:00
svlandeg
f6c451b650
cleanup
2020-06-17 14:45:54 +02:00
svlandeg
2d9f406188
fix test_cli
2020-06-17 14:42:48 +02:00
svlandeg
f7ad8e8c83
various fixes in scripts - needs to be further tested
2020-06-17 12:05:58 +02:00
svlandeg
3c4f9e4cc4
fix augment (needs further testing)
2020-06-17 10:46:29 +02:00
svlandeg
4ed399c848
minibatch utiltiy can deal with strings, docs or examples
2020-06-16 21:35:55 +02:00
svlandeg
8b66c11ff2
add spaces to json output format
2020-06-16 19:30:03 +02:00
svlandeg
ba80ad7efd
fixed some tests + WIP roundtrip unit test
2020-06-16 18:26:50 +02:00
svlandeg
43d41d6bb6
allow None as BILUO annotation
2020-06-16 15:30:05 +02:00
svlandeg
44a0f9c2c8
test_gold_biluo_different_tokenization works
2020-06-16 15:21:20 +02:00
svlandeg
1c35b8efcd
fix spaces
2020-06-16 12:08:25 +02:00
svlandeg
6fea5fa4bd
attempt to fix cases with weird spaces
2020-06-16 11:52:29 +02:00
svlandeg
0702a1d3fb
fix test for misaligned
2020-06-15 23:10:47 +02:00
svlandeg
a28f8f369e
Fix many-to-one IOB codes
2020-06-15 23:06:22 +02:00
svlandeg
12886b787b
fixing NER one-to-many alignment
2020-06-15 22:44:17 +02:00
Matthew Honnibal
a0bf73a5dd
Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow
2020-06-15 18:16:01 +02:00
Matthew Honnibal
c66f93299e
Remove TokenAnnotation code from nonproj
2020-06-15 18:14:47 +02:00
Matthew Honnibal
c95494739c
Fix import
2020-06-15 18:11:10 +02:00
Matthew Honnibal
8f978f2031
Fix import
2020-06-15 18:10:47 +02:00
Matthew Honnibal
95de7efaad
Draft create_gold_state for arc_eager oracle
2020-06-15 18:10:19 +02:00
svlandeg
68986a252e
additional tests for new get_aligned function
2020-06-15 17:42:40 +02:00
svlandeg
41d29983a7
start testing get_aligned
2020-06-15 17:16:01 +02:00
svlandeg
fd5f199feb
fixing language and scoring tests
2020-06-15 15:02:05 +02:00
svlandeg
b4d914ec77
fix error catching
2020-06-15 12:56:32 +02:00
svlandeg
b9c9cbb2cd
informative error when calling to_array with wrong field
2020-06-15 11:53:31 +02:00
svlandeg
ff231e1cdd
fix merge conflict
2020-06-15 09:04:19 +02:00
svlandeg
a48553c1ed
fix error numbers
2020-06-15 08:51:31 +02:00
Matthew Honnibal
3c0fc10dc4
Remove beam for now (maybe)
...
Remove beam_utils
Update setup.py
Remove beam
2020-06-14 19:53:29 +02:00
Matthew Honnibal
98ca14f577
Remove GoldParse
...
WIP on removing goldparse
Get ArcEager compiling after GoldParse excise
Update setup.py
Get spacy.syntax compiling after removing GoldParse
Rename NewExample -> Example and clean up
Clean html files
Start updating tests
Update Morphologizer
2020-06-14 19:53:30 +02:00
Matthew Honnibal
d53723aa4f
Merge from whatif/arrow
2020-06-14 17:43:59 +02:00
Matthew Honnibal
380cce9d8b
Update errors
2020-06-14 17:40:05 +02:00
Matthew Honnibal
706e652820
Merge from develop
2020-06-14 17:35:01 +02:00
Matthew Honnibal
9296d71a54
More GoldParse excise
2020-06-14 17:26:54 +02:00
Matthew Honnibal
60d4e5a9e0
WIP on updating transition-system
2020-06-14 17:22:14 +02:00
Matthew Honnibal
7d65615625
WIP start excising GoldParse
2020-06-14 17:11:41 +02:00
Matthew Honnibal
4362ec7084
Hack Language.evaluate
2020-06-13 23:37:42 +02:00
Matthew Honnibal
7de997c0a5
Update test
2020-06-13 23:11:45 +02:00
Matthew Honnibal
8f941ef527
Update GoldParse
2020-06-13 23:11:29 +02:00
Matthew Honnibal
3a0bbcfb4c
Add biluo_tags_from_doc function
2020-06-13 23:10:54 +02:00
Matthew Honnibal
caa7508725
Draft missing NewExample stuff
2020-06-13 23:10:21 +02:00
Matthew Honnibal
3eb8f3867e
Update test
2020-06-13 23:05:16 +02:00
Matthew Honnibal
5564314d32
Suggest approach for GoldParse
2020-06-13 15:43:35 +02:00
Matthew Honnibal
b078b05ecd
Handle various data better in NewExample
2020-06-13 15:30:12 +02:00
svlandeg
face0de74f
fix MORPH conversion + enable unit test
2020-06-12 16:29:09 +02:00
svlandeg
a5ee082da1
cats bugfix
2020-06-12 15:49:38 +02:00
svlandeg
880dccf93e
entities on doc_annotation, parse links and check their offsets against the entities. unit test works
2020-06-12 15:47:20 +02:00
svlandeg
3aed177a35
fix ENT_IOB conversion and enable unit test
2020-06-12 11:30:24 +02:00
Matthew Honnibal
a1c5b694be
Small fixes to train defaults
2020-06-12 02:22:13 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default ( #5575 )
...
* verbose and tag_map options
* adding init_tok2vec option and only changing the tok2vec that is specified
* adding omit_extra_lookups and verifying textcat config
* wip
* pretrain bugfix
* add replace and resume options
* train_textcat fix
* raw text functionality
* improve UX when KeyError or when input data can't be parsed
* avoid unnecessary access to goldparse in TextCat pipe
* save performance information in nlp.meta
* add noise_level to config
* move nn_parser's defaults to config file
* multitask in config - doesn't work yet
* scorer offering both F and AUC options, need to be specified in config
* add textcat verification code from old train script
* small fixes to config files
* clean up
* set default config for ner/parser to allow create_pipe to work as before
* two more test fixes
* small fixes
* cleanup
* fix NER pickling + additional unit test
* create_pipe as before
2020-06-12 02:02:07 +02:00
svlandeg
6a67a11682
adding tests for new example class (some still failing - WIP)
2020-06-11 17:43:40 +02:00
Matthew Honnibal
488727aee0
Start updating test
2020-06-09 23:58:28 +02:00
Matthew Honnibal
337d2b5ad6
Fix sent start in NewExample
2020-06-09 23:58:16 +02:00
Matthew Honnibal
ad547a4b8f
Refactor towards new Example class
2020-06-09 23:39:46 +02:00
Matthew Honnibal
82810b9846
Update morphologizer
2020-06-09 23:32:07 +02:00
Matthew Honnibal
af1b5f129b
Use new example class in GoldCorpus
2020-06-09 23:31:19 +02:00
Matthew Honnibal
0714f1fa5c
Remove the 'pass example into __call__' thing
2020-06-09 23:30:06 +02:00
Matthew Honnibal
b3868cd1f8
Update NewExample
2020-06-09 23:06:48 +02:00
Matthew Honnibal
ccd332a9fc
Update test stubs
2020-06-09 15:49:04 +02:00
Matthew Honnibal
04569c0b3e
Fix import
2020-06-09 15:44:08 +02:00
Matthew Honnibal
f4caaa8ad9
Update alignment
2020-06-09 15:43:57 +02:00
Matthew Honnibal
b5ef397639
Add header for align.pxd
2020-06-09 15:43:48 +02:00
Matthew Honnibal
793092d2d8
Fix renaming in GoldCorpus
2020-06-09 15:43:38 +02:00
Matthew Honnibal
36d49a0f13
Fix NewExample class
2020-06-09 15:43:19 +02:00
Matthew Honnibal
f1189dc205
Draft tests for new Example class
2020-06-09 15:43:08 +02:00
Matthew Honnibal
c833ebe1ad
Start tests for new example class
2020-06-09 15:29:05 +02:00
Matthew Honnibal
453cfa14d0
Start drafting new example class
2020-06-09 15:28:42 +02:00
Matthew Honnibal
449000c234
Fix gold_io
2020-06-09 12:43:53 +02:00
Matthew Honnibal
cb08ce3936
Move alignment into Cython
2020-06-09 12:40:41 +02:00
Matthew Honnibal
20a1bdb298
Fix train
2020-06-09 12:33:29 +02:00
Matthew Honnibal
549164c31c
Fix corpus when no raw text supplied
2020-06-09 12:33:14 +02:00
Matthew Honnibal
d9289712ba
* Make GoldCorpus return dict, not Example
...
* Make Example require a Doc object (previously optional)
Clarify methods in GoldCorpus
WIP refactor Example
Refactor Example.split_sents
Fix test
Fix augment
Update test
Update test
Fix import
Update test_scorer
Update Example
2020-06-09 01:01:59 +02:00
Matthew Honnibal
084271c9e9
Remove GoldParse from public API
...
* Move get_parses_from_example to spacy.syntax
* Get GoldParse out of Example
* Avoid expecting GoldParse input in parser
* Add Alignment to spacy.gold.align
* Update Example object
* Add comment
* Update pipeline
* Fix imports
* Simplify gold_io
* WIP on GoldCorpus
* Update test
* Xfail some gold tests
* Remove ignore_misaligned option from GoldCorpus
* Fix Example constructor
* Update test
* Fix usage of Example
* Add deprecated_get_gold method on Example
* Patch scorer
* Fix test
* Fix test
* Update tests
* Xfail a test
* Fix passing of make_projective
* Pass make_projective by default
* Hack data format in Example.from_dict
* Update tests
* Fix example.from_dict
* Update morphologizer
* Fix entity linker
* Add get_field to TokenAnnotation
* Fix Example.get_aligned
* Update test
* Fix alignment
* Fix corpus
* Fix GoldCorpus
* Handle misaligned
* Format
* Fix missing import
2020-06-08 22:09:57 +02:00
Matthew Honnibal
b69fa77ccc
Add missing inits
2020-06-06 15:38:46 +02:00
Matthew Honnibal
6e87ca1f45
Fix imports
2020-06-06 15:36:58 +02:00
Matthew Honnibal
53b00991fd
Fix imports
2020-06-06 15:36:46 +02:00
Matthew Honnibal
74204116a3
Rename _gold -> gold
2020-06-06 15:29:32 +02:00
Matthew Honnibal
7f135736f4
Fix imports
2020-06-06 15:28:52 +02:00
Matthew Honnibal
17533a9286
Format
2020-06-06 15:13:07 +02:00
Matthew Honnibal
0f9b4bbfea
Fix imports
2020-06-06 15:12:52 +02:00
Matthew Honnibal
866179350b
Fix import
2020-06-06 15:11:13 +02:00
Matthew Honnibal
3baa1ada03
Refactr spacy.gold
2020-06-06 15:10:33 +02:00
Matthew Honnibal
1d2e39d974
Support to_dict in Doc
2020-06-06 15:10:10 +02:00
Matthew Honnibal
7b873ce2b1
Move GoldParse under spacy.syntax
2020-06-06 15:09:43 +02:00
Matthew Honnibal
32c8fb1372
Add gold_io.pyx
2020-06-06 14:41:49 +02:00
Matthew Honnibal
156466ca69
Add iob_utils
2020-06-06 14:39:14 +02:00
Matthew Honnibal
53e6473e24
Add to/from dict helpers
2020-06-06 14:29:06 +02:00
Matthew Honnibal
a663d44b1b
Add GoldCorpus
2020-06-06 14:28:37 +02:00
Matthew Honnibal
1fb8fc6ea9
Add Example class
2020-06-06 14:24:35 +02:00
Matthew Honnibal
cce6a51a9c
Add annotation classes
2020-06-06 14:22:27 +02:00
Matthew Honnibal
6005b94e74
Add data augmentation
2020-06-06 14:19:06 +02:00
Matthew Honnibal
fcb4f7a6db
Start breaking down gold.pyx
2020-06-06 14:15:12 +02:00
Ines Montani
d93cbeb14f
Add warning for loose version constraints ( #5536 )
...
* Add warning for loose version constraints
* Update wording [ci skip]
* Tweak error message
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-06-05 12:42:15 +02:00
Matthew Honnibal
8411d4f4e6
Merge pull request #5543 from svlandeg/feature/pretrain-config
...
pretrain from config
2020-06-04 19:07:12 +02:00
svlandeg
3ade455fd3
formatting
2020-06-04 16:09:55 +02:00
svlandeg
776d4f1190
cleanup
2020-06-04 16:07:30 +02:00
svlandeg
6b027d7689
remove duplicate model definition of tok2vec layer
2020-06-04 15:49:23 +02:00
svlandeg
1775f54a26
small little fixes
2020-06-03 22:17:02 +02:00
svlandeg
07886a3de3
rename init_tok2vec to resume
2020-06-03 22:00:25 +02:00
svlandeg
4ed6278663
small fixes to pretrain config, init_tok2vec TODO
2020-06-03 19:32:40 +02:00
svlandeg
ffe0451d09
pretrain from config
2020-06-03 14:45:00 +02:00
Ines Montani
a8875d4a4b
Fix typo
2020-06-03 14:42:39 +02:00
Ines Montani
4e0610d0d4
Update warning codes
2020-06-03 14:37:09 +02:00
Ines Montani
810fce3bb1
Merge branch 'develop' into master-tmp
2020-06-03 14:36:59 +02:00
Adriane Boyd
b0ee76264b
Remove debugging
2020-06-03 14:20:42 +02:00
Adriane Boyd
1d8168d1fd
Fix problems with lower and whitespace in variants
...
Port relevant changes from #5361 :
* Initialize lower flag explicitly
* Handle whitespace words from GoldParse correctly when creating raw
text with orth variants
2020-06-03 14:15:58 +02:00
Adriane Boyd
10d938f221
Update default cfg dir in train CLI
2020-06-03 14:15:50 +02:00
Adriane Boyd
f1f9c8b417
Port train CLI updates
...
Updates from #5362 and fix from #5387 :
* `train`:
* if training on GPU, only run evaluation/timing on CPU in the first
iteration
* if training is aborted, exit with a non-0 exit status
2020-06-03 14:03:43 +02:00
svlandeg
eac12cbb77
make dropout in embed layers configurable
2020-06-03 11:50:16 +02:00
svlandeg
e91485dfc4
add discard_oversize parameter, move optimizer to training subsection
2020-06-03 10:04:16 +02:00
svlandeg
03c58b488c
prevent infinite loop, custom warning
2020-06-03 10:00:21 +02:00
svlandeg
6504b7f161
Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config
2020-06-03 08:30:16 +02:00
svlandeg
c5ac382f0a
fix name clash
2020-06-02 22:24:57 +02:00
svlandeg
2bf5111ecf
additional test with discard_oversize=False
2020-06-02 22:09:37 +02:00
svlandeg
aa6271b16c
extending algorithm to deal better with edge cases
2020-06-02 22:05:08 +02:00
svlandeg
f2e162fc60
it's only oversized if the tolerance level is also exceeded
2020-06-02 19:59:04 +02:00
svlandeg
ef834b4cd7
fix comments
2020-06-02 19:50:44 +02:00
svlandeg
6208d322d3
slightly more challenging unit test
2020-06-02 19:47:30 +02:00
svlandeg
6651fafd5c
using overflow buffer for examples within the tolerance margin
2020-06-02 19:43:39 +02:00
svlandeg
85b0597ed5
add test for minibatch util
2020-06-02 18:26:21 +02:00
svlandeg
5b350a6c99
bugfix of the bugfix
2020-06-02 17:49:33 +02:00
svlandeg
fdfd822936
rewrite minibatch_by_words function
2020-06-02 15:22:54 +02:00
svlandeg
ec52e7f886
add oversize examples before StopIteration returns
2020-06-02 13:21:55 +02:00
svlandeg
e0f9f448f1
remove Tensorizer
2020-06-01 23:38:48 +02:00
Ines Montani
b5ae2edcba
Merge pull request #5516 from explosion/feature/improve-model-version-deps
2020-05-31 12:54:01 +02:00
Ines Montani
dc186afdc5
Add warning
2020-05-30 15:34:54 +02:00
Ines Montani
b7aff6020c
Make functions more general purpose and update docstrings and tests
2020-05-30 15:18:53 +02:00
Ines Montani
a7e370bcbf
Don't override spaCy version
2020-05-30 15:03:18 +02:00
Ines Montani
e47e5a4b10
Use more sophisticated version parsing logic
2020-05-30 15:01:58 +02:00
Ines Montani
4fd087572a
WIP: improve model version deps
2020-05-28 12:51:37 +02:00
Matthw Honnibal
58750b06f8
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-05-27 22:18:36 +02:00
Ines Montani
1a15896ba9
unicode -> str consistency [ci skip]
2020-05-24 18:51:10 +02:00
Ines Montani
5d3806e059
unicode -> str consistency
2020-05-24 17:20:58 +02:00
Ines Montani
387c7aba15
Update test
2020-05-24 14:55:16 +02:00
Ines Montani
f9786d765e
Simplify is_package check
2020-05-24 14:48:56 +02:00
Matthw Honnibal
2d9de8684d
Support use_pytorch_for_gpu_memory config
2020-05-22 23:10:40 +02:00
Ines Montani
4465cad6c5
Rename spacy.analysis to spacy.pipe_analysis
2020-05-22 17:42:06 +02:00
Ines Montani
25d6ed3fb8
Merge pull request #5489 from explosion/feature/connected-components
2020-05-22 17:40:11 +02:00
Ines Montani
841c05b47b
Merge pull request #5490 from explosion/fix/remove-jsonschema
2020-05-22 17:39:54 +02:00
Ines Montani
569a65b60e
Auto-format
2020-05-22 16:55:42 +02:00
Ines Montani
d844528c5f
Add test for is_compatible_model
2020-05-22 16:55:15 +02:00
Ines Montani
12b7be1d98
Remove jsonschema from dependencies
2020-05-22 16:49:26 +02:00
Matthew Honnibal
f7f6df7275
Move to spacy.analysis
2020-05-22 16:43:18 +02:00
Matthew Honnibal
78d79d94ce
Guess set_annotations=True in nlp.update
...
During `nlp.update`, components can be passed a boolean set_annotations
to indicate whether they should assign annotations to the `Doc`. This
needs to be called if downstream components expect to use the
annotations during training, e.g. if we wanted to use tagger features in
the parser.
Components can specify their assignments and requirements, so we can
figure out which components have these inter-dependencies. After
figuring this out, we can guess whether to pass set_annotations=True.
We could also call set_annotations=True always, or even just have this
as the only behaviour. The downside of this is that it would require the
`Doc` objects to be created afresh to avoid problematic modifications.
One approach would be to make a fresh copy of the `Doc` objects within
`nlp.update()`, so that we can write to the objects without any
problems. If we do that, we can drop this logic and also drop the
`set_annotations` mechanism. I would be fine with that approach,
although it runs the risk of introducing some performance overhead, and
we'll have to take care to copy all extension attributes etc.
2020-05-22 15:55:45 +02:00
Ines Montani
6e6db6afb6
Better model compatibility and validation
2020-05-22 15:42:46 +02:00
Matthw Honnibal
25b51f4fc8
Set version to v3.0.0.dev9
2020-05-21 20:47:52 +02:00
Matthw Honnibal
bc94fdabd0
Fix begin_training
2020-05-21 20:46:21 +02:00