Commit Graph

7076 Commits

Author SHA1 Message Date
svlandeg
2d9f406188 fix test_cli 2020-06-17 14:42:48 +02:00
svlandeg
f7ad8e8c83 various fixes in scripts - needs to be further tested 2020-06-17 12:05:58 +02:00
svlandeg
3c4f9e4cc4 fix augment (needs further testing) 2020-06-17 10:46:29 +02:00
svlandeg
4ed399c848 minibatch utiltiy can deal with strings, docs or examples 2020-06-16 21:35:55 +02:00
svlandeg
8b66c11ff2 add spaces to json output format 2020-06-16 19:30:03 +02:00
svlandeg
ba80ad7efd fixed some tests + WIP roundtrip unit test 2020-06-16 18:26:50 +02:00
svlandeg
43d41d6bb6 allow None as BILUO annotation 2020-06-16 15:30:05 +02:00
svlandeg
44a0f9c2c8 test_gold_biluo_different_tokenization works 2020-06-16 15:21:20 +02:00
svlandeg
1c35b8efcd fix spaces 2020-06-16 12:08:25 +02:00
svlandeg
6fea5fa4bd attempt to fix cases with weird spaces 2020-06-16 11:52:29 +02:00
svlandeg
0702a1d3fb fix test for misaligned 2020-06-15 23:10:47 +02:00
svlandeg
a28f8f369e Fix many-to-one IOB codes 2020-06-15 23:06:22 +02:00
svlandeg
12886b787b fixing NER one-to-many alignment 2020-06-15 22:44:17 +02:00
Matthew Honnibal
a0bf73a5dd Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-15 18:16:01 +02:00
Matthew Honnibal
c66f93299e Remove TokenAnnotation code from nonproj 2020-06-15 18:14:47 +02:00
Matthew Honnibal
c95494739c Fix import 2020-06-15 18:11:10 +02:00
Matthew Honnibal
8f978f2031 Fix import 2020-06-15 18:10:47 +02:00
Matthew Honnibal
95de7efaad Draft create_gold_state for arc_eager oracle 2020-06-15 18:10:19 +02:00
svlandeg
68986a252e additional tests for new get_aligned function 2020-06-15 17:42:40 +02:00
svlandeg
41d29983a7 start testing get_aligned 2020-06-15 17:16:01 +02:00
svlandeg
fd5f199feb fixing language and scoring tests 2020-06-15 15:02:05 +02:00
svlandeg
b4d914ec77 fix error catching 2020-06-15 12:56:32 +02:00
svlandeg
b9c9cbb2cd informative error when calling to_array with wrong field 2020-06-15 11:53:31 +02:00
svlandeg
ff231e1cdd fix merge conflict 2020-06-15 09:04:19 +02:00
svlandeg
a48553c1ed fix error numbers 2020-06-15 08:51:31 +02:00
Matthew Honnibal
3c0fc10dc4 Remove beam for now (maybe)
Remove beam_utils

Update setup.py

Remove beam
2020-06-14 19:53:29 +02:00
Matthew Honnibal
98ca14f577 Remove GoldParse
WIP on removing goldparse

Get ArcEager compiling after GoldParse excise

Update setup.py

Get spacy.syntax compiling after removing GoldParse

Rename NewExample -> Example and clean up

Clean html files

Start updating tests

Update Morphologizer
2020-06-14 19:53:30 +02:00
Matthew Honnibal
d53723aa4f Merge from whatif/arrow 2020-06-14 17:43:59 +02:00
Matthew Honnibal
380cce9d8b Update errors 2020-06-14 17:40:05 +02:00
Matthew Honnibal
706e652820 Merge from develop 2020-06-14 17:35:01 +02:00
Matthew Honnibal
9296d71a54 More GoldParse excise 2020-06-14 17:26:54 +02:00
Matthew Honnibal
60d4e5a9e0 WIP on updating transition-system 2020-06-14 17:22:14 +02:00
Matthew Honnibal
7d65615625 WIP start excising GoldParse 2020-06-14 17:11:41 +02:00
Matthew Honnibal
4362ec7084 Hack Language.evaluate 2020-06-13 23:37:42 +02:00
Matthew Honnibal
7de997c0a5 Update test 2020-06-13 23:11:45 +02:00
Matthew Honnibal
8f941ef527 Update GoldParse 2020-06-13 23:11:29 +02:00
Matthew Honnibal
3a0bbcfb4c Add biluo_tags_from_doc function 2020-06-13 23:10:54 +02:00
Matthew Honnibal
caa7508725 Draft missing NewExample stuff 2020-06-13 23:10:21 +02:00
Matthew Honnibal
3eb8f3867e Update test 2020-06-13 23:05:16 +02:00
Matthew Honnibal
5564314d32 Suggest approach for GoldParse 2020-06-13 15:43:35 +02:00
Matthew Honnibal
b078b05ecd Handle various data better in NewExample 2020-06-13 15:30:12 +02:00
svlandeg
face0de74f fix MORPH conversion + enable unit test 2020-06-12 16:29:09 +02:00
svlandeg
a5ee082da1 cats bugfix 2020-06-12 15:49:38 +02:00
svlandeg
880dccf93e entities on doc_annotation, parse links and check their offsets against the entities. unit test works 2020-06-12 15:47:20 +02:00
svlandeg
3aed177a35 fix ENT_IOB conversion and enable unit test 2020-06-12 11:30:24 +02:00
Matthew Honnibal
a1c5b694be Small fixes to train defaults 2020-06-12 02:22:13 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
svlandeg
6a67a11682 adding tests for new example class (some still failing - WIP) 2020-06-11 17:43:40 +02:00
Matthew Honnibal
488727aee0 Start updating test 2020-06-09 23:58:28 +02:00
Matthew Honnibal
337d2b5ad6 Fix sent start in NewExample 2020-06-09 23:58:16 +02:00
Matthew Honnibal
ad547a4b8f Refactor towards new Example class 2020-06-09 23:39:46 +02:00
Matthew Honnibal
82810b9846 Update morphologizer 2020-06-09 23:32:07 +02:00
Matthew Honnibal
af1b5f129b Use new example class in GoldCorpus 2020-06-09 23:31:19 +02:00
Matthew Honnibal
0714f1fa5c Remove the 'pass example into __call__' thing 2020-06-09 23:30:06 +02:00
Matthew Honnibal
b3868cd1f8 Update NewExample 2020-06-09 23:06:48 +02:00
Matthew Honnibal
ccd332a9fc Update test stubs 2020-06-09 15:49:04 +02:00
Matthew Honnibal
04569c0b3e Fix import 2020-06-09 15:44:08 +02:00
Matthew Honnibal
f4caaa8ad9 Update alignment 2020-06-09 15:43:57 +02:00
Matthew Honnibal
b5ef397639 Add header for align.pxd 2020-06-09 15:43:48 +02:00
Matthew Honnibal
793092d2d8 Fix renaming in GoldCorpus 2020-06-09 15:43:38 +02:00
Matthew Honnibal
36d49a0f13 Fix NewExample class 2020-06-09 15:43:19 +02:00
Matthew Honnibal
f1189dc205 Draft tests for new Example class 2020-06-09 15:43:08 +02:00
Matthew Honnibal
c833ebe1ad Start tests for new example class 2020-06-09 15:29:05 +02:00
Matthew Honnibal
453cfa14d0 Start drafting new example class 2020-06-09 15:28:42 +02:00
Matthew Honnibal
449000c234 Fix gold_io 2020-06-09 12:43:53 +02:00
Matthew Honnibal
cb08ce3936 Move alignment into Cython 2020-06-09 12:40:41 +02:00
Matthew Honnibal
20a1bdb298 Fix train 2020-06-09 12:33:29 +02:00
Matthew Honnibal
549164c31c Fix corpus when no raw text supplied 2020-06-09 12:33:14 +02:00
Matthew Honnibal
d9289712ba * Make GoldCorpus return dict, not Example
* Make Example require a Doc object (previously optional)

Clarify methods in GoldCorpus

WIP refactor Example

Refactor Example.split_sents

Fix test

Fix augment

Update test

Update test

Fix import

Update test_scorer

Update Example
2020-06-09 01:01:59 +02:00
Matthew Honnibal
084271c9e9
Remove GoldParse from public API
* Move get_parses_from_example to spacy.syntax

* Get GoldParse out of Example

* Avoid expecting GoldParse input in parser

* Add Alignment to spacy.gold.align

* Update Example object

* Add comment

* Update pipeline

* Fix imports

* Simplify gold_io

* WIP on GoldCorpus

* Update test

* Xfail some gold tests

* Remove ignore_misaligned option from GoldCorpus

* Fix Example constructor

* Update test

* Fix usage of Example

* Add deprecated_get_gold method on Example

* Patch scorer

* Fix test

* Fix test

* Update tests

* Xfail a test

* Fix passing of make_projective

* Pass make_projective by default

* Hack data format in Example.from_dict

* Update tests

* Fix example.from_dict

* Update morphologizer

* Fix entity linker

* Add get_field to TokenAnnotation

* Fix Example.get_aligned

* Update test

* Fix alignment

* Fix corpus

* Fix GoldCorpus

* Handle misaligned

* Format

* Fix missing import
2020-06-08 22:09:57 +02:00
Matthew Honnibal
b69fa77ccc Add missing inits 2020-06-06 15:38:46 +02:00
Matthew Honnibal
6e87ca1f45 Fix imports 2020-06-06 15:36:58 +02:00
Matthew Honnibal
53b00991fd Fix imports 2020-06-06 15:36:46 +02:00
Matthew Honnibal
74204116a3 Rename _gold -> gold 2020-06-06 15:29:32 +02:00
Matthew Honnibal
7f135736f4 Fix imports 2020-06-06 15:28:52 +02:00
Matthew Honnibal
17533a9286 Format 2020-06-06 15:13:07 +02:00
Matthew Honnibal
0f9b4bbfea Fix imports 2020-06-06 15:12:52 +02:00
Matthew Honnibal
866179350b Fix import 2020-06-06 15:11:13 +02:00
Matthew Honnibal
3baa1ada03 Refactr spacy.gold 2020-06-06 15:10:33 +02:00
Matthew Honnibal
1d2e39d974 Support to_dict in Doc 2020-06-06 15:10:10 +02:00
Matthew Honnibal
7b873ce2b1 Move GoldParse under spacy.syntax 2020-06-06 15:09:43 +02:00
Matthew Honnibal
32c8fb1372 Add gold_io.pyx 2020-06-06 14:41:49 +02:00
Matthew Honnibal
156466ca69 Add iob_utils 2020-06-06 14:39:14 +02:00
Matthew Honnibal
53e6473e24 Add to/from dict helpers 2020-06-06 14:29:06 +02:00
Matthew Honnibal
a663d44b1b Add GoldCorpus 2020-06-06 14:28:37 +02:00
Matthew Honnibal
1fb8fc6ea9 Add Example class 2020-06-06 14:24:35 +02:00
Matthew Honnibal
cce6a51a9c Add annotation classes 2020-06-06 14:22:27 +02:00
Matthew Honnibal
6005b94e74 Add data augmentation 2020-06-06 14:19:06 +02:00
Matthew Honnibal
fcb4f7a6db Start breaking down gold.pyx 2020-06-06 14:15:12 +02:00
Ines Montani
d93cbeb14f
Add warning for loose version constraints (#5536)
* Add warning for loose version constraints

* Update wording [ci skip]

* Tweak error message

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-06-05 12:42:15 +02:00
Matthew Honnibal
8411d4f4e6
Merge pull request #5543 from svlandeg/feature/pretrain-config
pretrain from config
2020-06-04 19:07:12 +02:00
svlandeg
3ade455fd3 formatting 2020-06-04 16:09:55 +02:00
svlandeg
776d4f1190 cleanup 2020-06-04 16:07:30 +02:00
svlandeg
6b027d7689 remove duplicate model definition of tok2vec layer 2020-06-04 15:49:23 +02:00
svlandeg
1775f54a26 small little fixes 2020-06-03 22:17:02 +02:00
svlandeg
07886a3de3 rename init_tok2vec to resume 2020-06-03 22:00:25 +02:00
svlandeg
4ed6278663 small fixes to pretrain config, init_tok2vec TODO 2020-06-03 19:32:40 +02:00
svlandeg
ffe0451d09 pretrain from config 2020-06-03 14:45:00 +02:00
Ines Montani
a8875d4a4b Fix typo 2020-06-03 14:42:39 +02:00
Ines Montani
4e0610d0d4 Update warning codes 2020-06-03 14:37:09 +02:00