Commit Graph

7000 Commits

Author SHA1 Message Date
Matthew Honnibal
98ca14f577 Remove GoldParse
WIP on removing goldparse

Get ArcEager compiling after GoldParse excise

Update setup.py

Get spacy.syntax compiling after removing GoldParse

Rename NewExample -> Example and clean up

Clean html files

Start updating tests

Update Morphologizer
2020-06-14 19:53:30 +02:00
Matthew Honnibal
d53723aa4f Merge from whatif/arrow 2020-06-14 17:43:59 +02:00
Matthew Honnibal
380cce9d8b Update errors 2020-06-14 17:40:05 +02:00
Matthew Honnibal
706e652820 Merge from develop 2020-06-14 17:35:01 +02:00
Matthew Honnibal
9296d71a54 More GoldParse excise 2020-06-14 17:26:54 +02:00
Matthew Honnibal
60d4e5a9e0 WIP on updating transition-system 2020-06-14 17:22:14 +02:00
Matthew Honnibal
7d65615625 WIP start excising GoldParse 2020-06-14 17:11:41 +02:00
Matthew Honnibal
4362ec7084 Hack Language.evaluate 2020-06-13 23:37:42 +02:00
Matthew Honnibal
7de997c0a5 Update test 2020-06-13 23:11:45 +02:00
Matthew Honnibal
8f941ef527 Update GoldParse 2020-06-13 23:11:29 +02:00
Matthew Honnibal
3a0bbcfb4c Add biluo_tags_from_doc function 2020-06-13 23:10:54 +02:00
Matthew Honnibal
caa7508725 Draft missing NewExample stuff 2020-06-13 23:10:21 +02:00
Matthew Honnibal
3eb8f3867e Update test 2020-06-13 23:05:16 +02:00
Matthew Honnibal
5564314d32 Suggest approach for GoldParse 2020-06-13 15:43:35 +02:00
Matthew Honnibal
b078b05ecd Handle various data better in NewExample 2020-06-13 15:30:12 +02:00
svlandeg
face0de74f fix MORPH conversion + enable unit test 2020-06-12 16:29:09 +02:00
svlandeg
a5ee082da1 cats bugfix 2020-06-12 15:49:38 +02:00
svlandeg
880dccf93e entities on doc_annotation, parse links and check their offsets against the entities. unit test works 2020-06-12 15:47:20 +02:00
svlandeg
3aed177a35 fix ENT_IOB conversion and enable unit test 2020-06-12 11:30:24 +02:00
Matthew Honnibal
a1c5b694be Small fixes to train defaults 2020-06-12 02:22:13 +02:00
Sofie Van Landeghem
c0f4a1e43b
train is from-config by default (#5575)
* verbose and tag_map options

* adding init_tok2vec option and only changing the tok2vec that is specified

* adding omit_extra_lookups and verifying textcat config

* wip

* pretrain bugfix

* add replace and resume options

* train_textcat fix

* raw text functionality

* improve UX when KeyError or when input data can't be parsed

* avoid unnecessary access to goldparse in TextCat pipe

* save performance information in nlp.meta

* add noise_level to config

* move nn_parser's defaults to config file

* multitask in config - doesn't work yet

* scorer offering both F and AUC options, need to be specified in config

* add textcat verification code from old train script

* small fixes to config files

* clean up

* set default config for ner/parser to allow create_pipe to work as before

* two more test fixes

* small fixes

* cleanup

* fix NER pickling + additional unit test

* create_pipe as before
2020-06-12 02:02:07 +02:00
svlandeg
6a67a11682 adding tests for new example class (some still failing - WIP) 2020-06-11 17:43:40 +02:00
Matthew Honnibal
488727aee0 Start updating test 2020-06-09 23:58:28 +02:00
Matthew Honnibal
337d2b5ad6 Fix sent start in NewExample 2020-06-09 23:58:16 +02:00
Matthew Honnibal
ad547a4b8f Refactor towards new Example class 2020-06-09 23:39:46 +02:00
Matthew Honnibal
82810b9846 Update morphologizer 2020-06-09 23:32:07 +02:00
Matthew Honnibal
af1b5f129b Use new example class in GoldCorpus 2020-06-09 23:31:19 +02:00
Matthew Honnibal
0714f1fa5c Remove the 'pass example into __call__' thing 2020-06-09 23:30:06 +02:00
Matthew Honnibal
b3868cd1f8 Update NewExample 2020-06-09 23:06:48 +02:00
Matthew Honnibal
ccd332a9fc Update test stubs 2020-06-09 15:49:04 +02:00
Matthew Honnibal
04569c0b3e Fix import 2020-06-09 15:44:08 +02:00
Matthew Honnibal
f4caaa8ad9 Update alignment 2020-06-09 15:43:57 +02:00
Matthew Honnibal
b5ef397639 Add header for align.pxd 2020-06-09 15:43:48 +02:00
Matthew Honnibal
793092d2d8 Fix renaming in GoldCorpus 2020-06-09 15:43:38 +02:00
Matthew Honnibal
36d49a0f13 Fix NewExample class 2020-06-09 15:43:19 +02:00
Matthew Honnibal
f1189dc205 Draft tests for new Example class 2020-06-09 15:43:08 +02:00
Matthew Honnibal
c833ebe1ad Start tests for new example class 2020-06-09 15:29:05 +02:00
Matthew Honnibal
453cfa14d0 Start drafting new example class 2020-06-09 15:28:42 +02:00
Matthew Honnibal
449000c234 Fix gold_io 2020-06-09 12:43:53 +02:00
Matthew Honnibal
cb08ce3936 Move alignment into Cython 2020-06-09 12:40:41 +02:00
Matthew Honnibal
20a1bdb298 Fix train 2020-06-09 12:33:29 +02:00
Matthew Honnibal
549164c31c Fix corpus when no raw text supplied 2020-06-09 12:33:14 +02:00
Matthew Honnibal
d9289712ba * Make GoldCorpus return dict, not Example
* Make Example require a Doc object (previously optional)

Clarify methods in GoldCorpus

WIP refactor Example

Refactor Example.split_sents

Fix test

Fix augment

Update test

Update test

Fix import

Update test_scorer

Update Example
2020-06-09 01:01:59 +02:00
Matthew Honnibal
084271c9e9
Remove GoldParse from public API
* Move get_parses_from_example to spacy.syntax

* Get GoldParse out of Example

* Avoid expecting GoldParse input in parser

* Add Alignment to spacy.gold.align

* Update Example object

* Add comment

* Update pipeline

* Fix imports

* Simplify gold_io

* WIP on GoldCorpus

* Update test

* Xfail some gold tests

* Remove ignore_misaligned option from GoldCorpus

* Fix Example constructor

* Update test

* Fix usage of Example

* Add deprecated_get_gold method on Example

* Patch scorer

* Fix test

* Fix test

* Update tests

* Xfail a test

* Fix passing of make_projective

* Pass make_projective by default

* Hack data format in Example.from_dict

* Update tests

* Fix example.from_dict

* Update morphologizer

* Fix entity linker

* Add get_field to TokenAnnotation

* Fix Example.get_aligned

* Update test

* Fix alignment

* Fix corpus

* Fix GoldCorpus

* Handle misaligned

* Format

* Fix missing import
2020-06-08 22:09:57 +02:00
Matthew Honnibal
b69fa77ccc Add missing inits 2020-06-06 15:38:46 +02:00
Matthew Honnibal
6e87ca1f45 Fix imports 2020-06-06 15:36:58 +02:00
Matthew Honnibal
53b00991fd Fix imports 2020-06-06 15:36:46 +02:00
Matthew Honnibal
74204116a3 Rename _gold -> gold 2020-06-06 15:29:32 +02:00
Matthew Honnibal
7f135736f4 Fix imports 2020-06-06 15:28:52 +02:00
Matthew Honnibal
17533a9286 Format 2020-06-06 15:13:07 +02:00