Commit Graph

7325 Commits

Author SHA1 Message Date
Matthew Honnibal
2efe01bf26 Fix parser declaration 2020-06-22 00:54:38 +02:00
Matthew Honnibal
29d39d8a34 Update header 2020-06-22 00:54:38 +02:00
Matthew Honnibal
456e27dc8b Start debugging arc_eager oracle 2020-06-22 00:54:38 +02:00
Matthew Honnibal
b60eede321 Fix parser model 2020-06-22 00:54:38 +02:00
Matthew Honnibal
17efd6bfec Update train.py 2020-06-22 00:54:38 +02:00
Matthew Honnibal
49145b9ec1 Update DocBin
Add missing strings when serializing
2020-06-22 00:54:35 +02:00
Matthew Honnibal
17226a60ac Draft Corpus class for DocBin
Update Corpus

Fix Corpus
2020-06-22 00:51:22 +02:00
Matthew Honnibal
6e7a7ab6da Work on train script 2020-06-22 00:48:09 +02:00
Matthew Honnibal
a5ebfb20f5 Serialize all attrs by default
Move converters under spacy.gold

Move things around

Fix naming

Fix name

Update converter to produce DocBin

Update converters

Make spacy convert output docbin

Fix import

Fix docbin

Fix import

Update converter

Remove jsonl converter

Add json2docs converter
2020-06-22 00:46:08 +02:00
Matthew Honnibal
5467cb4aae Allow DocBin to take list of Doc objects. 2020-06-22 00:46:08 +02:00
Matthew Honnibal
d422f30a18 Start updating converters 2020-06-22 00:46:12 +02:00
svlandeg
6d5bfd6f6a fix test checking for variants 2020-06-22 00:46:08 +02:00
svlandeg
a427ca9355 clean up 2020-06-22 00:46:08 +02:00
svlandeg
5477bf054f add links to to_dict 2020-06-22 00:46:08 +02:00
Matthew Honnibal
39117de4f9 Fix compile in ArcEager 2020-06-22 00:46:08 +02:00
Matthew Honnibal
e2279eab1c Make doc.from_array several times faster 2020-06-22 00:46:08 +02:00
Matthew Honnibal
de32515bf8 Allocate Doc before starting to add words 2020-06-22 00:46:08 +02:00
Ines Montani
79dd824906 Tidy up 2020-06-22 00:45:40 +02:00
Ines Montani
1e5b4d8524 Fix DVC check 2020-06-22 00:30:05 +02:00
Ines Montani
5ba1df5e78 Update project CLI 2020-06-22 00:15:06 +02:00
Ines Montani
ef5f548fb0 Tidy up and auto-format 2020-06-21 22:38:04 +02:00
Ines Montani
f77e0bc028 Merge branch 'develop' into master-tmp 2020-06-21 22:34:15 +02:00
Ines Montani
40bb918a4c Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
Ines Montani
275bab62df Refactor CLI 2020-06-21 21:35:01 +02:00
Ines Montani
c12713a8be Port CLI to Typer and add project stubs 2020-06-21 13:44:00 +02:00
Matthew Honnibal
6670c44390 Unskip tests 2020-06-21 01:17:52 +02:00
Matthew Honnibal
90d9f04e0b Unskip 2020-06-21 01:16:33 +02:00
Matthew Honnibal
2b180ea033 Update test 2020-06-21 01:15:41 +02:00
Matthew Honnibal
192b94f0a1 Remove beam test 2020-06-21 01:15:12 +02:00
Matthew Honnibal
9db66ddd48 Update test_arc_eager_oracle 2020-06-21 01:12:28 +02:00
Matthew Honnibal
7544c21f5b Update transition system 2020-06-21 01:12:05 +02:00
Matthew Honnibal
318a046fb0 Restore ArcEager.get_cost function 2020-06-21 01:11:08 +02:00
Matthew Honnibal
e90341810c Update arc_eager oracle 2020-06-21 01:04:02 +02:00
Matthew Honnibal
c58deb3546 Work on parser oracle 2020-06-21 01:01:09 +02:00
svlandeg
689600e17d add additional test back in (it works now) 2020-06-20 23:23:57 +02:00
svlandeg
2f6062a8a4 add line that got removed from EntityLinker 2020-06-20 23:14:45 +02:00
svlandeg
12dc8ab208 remove redundant code from master in EntityLinker 2020-06-20 23:07:42 +02:00
svlandeg
6179774278 fix test_build_dependencies by ignoring new libs 2020-06-20 22:49:37 +02:00
svlandeg
256d4c27c8 fix tagger begin_training being called without examples 2020-06-20 22:38:00 +02:00
Matthew Honnibal
914924a68b Fix mimport 2020-06-20 22:22:40 +02:00
Matthew Honnibal
2791c1c0dc Fix module name of corpus 2020-06-20 22:22:14 +02:00
Matthew Honnibal
4bbc277758 Update after removing GoldCorpus 2020-06-20 22:21:24 +02:00
Matthew Honnibal
64d00520e2 Update imports 2020-06-20 22:21:08 +02:00
Matthew Honnibal
cfd024536d Remove GoldCorpus 2020-06-20 22:13:37 +02:00
Matthew Honnibal
fd83551eb5 Skip test causing segfault 2020-06-20 22:11:27 +02:00
svlandeg
5cb812e0ab fix NER warn empty lookups (cf PR #5588) 2020-06-20 22:04:18 +02:00
Matthew Honnibal
095710e40e Skip tests that cause crashes 2020-06-20 22:02:32 +02:00
Matthew Honnibal
0b23fd3891 Xfail some tests 2020-06-20 21:52:57 +02:00
Matthew Honnibal
6af99f2f2d Fix parser declaration 2020-06-20 21:50:17 +02:00
Matthew Honnibal
52edb24f07 Update header 2020-06-20 21:50:06 +02:00
Matthew Honnibal
0c10831b14 Start debugging arc_eager oracle 2020-06-20 21:49:46 +02:00
Matthew Honnibal
2bcb5881d7 Fix parser model 2020-06-20 21:49:31 +02:00
Matthew Honnibal
396dd60b3a Fix Corpus 2020-06-20 21:49:15 +02:00
Matthew Honnibal
450c6fe39c Update train.py 2020-06-20 21:49:06 +02:00
svlandeg
c9242e9bf4 fix entity linker (cf PR #5548) 2020-06-20 21:47:23 +02:00
svlandeg
dc069e90b3 fix token.morph_ for v.3 (cf PR #5517) 2020-06-20 21:13:11 +02:00
Matthew Honnibal
6d821b2e55 Make doc.from_array several times faster 2020-06-20 20:17:13 +02:00
Matthew Honnibal
fa86aa581d Allocate Doc before starting to add words 2020-06-20 20:15:21 +02:00
Matthew Honnibal
652f31d3ee Update DocBin 2020-06-20 20:12:54 +02:00
Matthew Honnibal
0a8b6631a2 Update Corpus 2020-06-20 20:12:31 +02:00
Matthew Honnibal
11fa0658f7 Work on train script 2020-06-20 20:12:19 +02:00
Ines Montani
988d2a4eda
Add --code-path option to train CLI (#5618) 2020-06-20 18:43:12 +02:00
Matthew Honnibal
0de361cd00 Draft Corpus class for DocBin 2020-06-20 18:31:07 +02:00
Ines Montani
5424b70e51 Remove v2 test 2020-06-20 16:18:53 +02:00
Ines Montani
63c22969f4 Update test_issue5230.py 2020-06-20 16:17:48 +02:00
Ines Montani
296b5d633b Remove references to Python 2 / is_python2 2020-06-20 16:11:13 +02:00
Matthew Honnibal
7360d3db72 Add json2docs converter 2020-06-20 16:02:53 +02:00
Ines Montani
0cdb631e6c Fix merge errors 2020-06-20 16:02:42 +02:00
Matthew Honnibal
f1756a6a22 Remove jsonl converter 2020-06-20 16:02:40 +02:00
Matthew Honnibal
5d89b1840e Update converter 2020-06-20 16:00:14 +02:00
Matthew Honnibal
f5780cb160 Serialize all attrs by default 2020-06-20 15:59:39 +02:00
Matthew Honnibal
3241acbe0b Fix import 2020-06-20 15:56:28 +02:00
Matthew Honnibal
b7a366b435 Fix compile in ArcEager 2020-06-20 15:56:16 +02:00
Matthew Honnibal
91fa2f1126 Fix docbin 2020-06-20 15:56:05 +02:00
Matthew Honnibal
476bcd4c53 Fix import 2020-06-20 15:55:57 +02:00
Matthew Honnibal
7a846921a3 Make spacy convert output docbin 2020-06-20 15:55:35 +02:00
Ines Montani
52728d8fa3 Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
Ines Montani
f91e9e8c84 Remove F841 [ci skip] 2020-06-20 14:47:17 +02:00
Ines Montani
8283df80e9 Tidy up and auto-format 2020-06-20 14:15:04 +02:00
Matthew Honnibal
0d22c6e006 Allow DocBin to take list of Doc objects. 2020-06-20 03:50:36 +02:00
Matthew Honnibal
95df028758 Update converters 2020-06-20 03:50:23 +02:00
Matthew Honnibal
3a73d95dcc Update converter to produce DocBin 2020-06-20 03:50:13 +02:00
Matthew Honnibal
d9a8fdf4b7 Fix name 2020-06-20 03:26:36 +02:00
Matthew Honnibal
e20a780867 Fix naming 2020-06-20 03:24:49 +02:00
Matthew Honnibal
f61d5e3ac3 Move things around 2020-06-20 03:23:58 +02:00
Matthew Honnibal
c630cfdb5e Move converters under spacy.gold 2020-06-20 03:20:34 +02:00
Matthew Honnibal
161d8439fa Start updating converters 2020-06-20 03:19:40 +02:00
Matthew Honnibal
a79f0598a6 Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-20 02:36:40 +02:00
Matthew Honnibal
be81577719 Fix oracles 2020-06-20 02:36:12 +02:00
Marat M. Yavrumyan
8120b641cc
Update lex_attrs.py (#5608) 2020-06-19 20:00:34 +02:00
svlandeg
e30ec9b2a8 fix test checking for variants 2020-06-19 14:05:35 +02:00
svlandeg
25b0674320 clean up 2020-06-19 11:31:01 +02:00
svlandeg
c705a28438 add links to to_dict 2020-06-19 11:22:24 +02:00
Matthew Honnibal
03db143cd0 Draft new GoldCorpus class 2020-06-19 04:15:02 +02:00
Matthew Honnibal
a389866df6 Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-19 02:30:27 +02:00
Matthew Honnibal
bd29b7b14f Update parser and NER gold stuff 2020-06-19 02:29:16 +02:00
Matthew Honnibal
5ae9e3480d Return ArcEagerGoldParse from ArcEager 2020-06-19 00:11:59 +02:00
svlandeg
6ca6d7d6b4 test for split sentences with various alignment issues, works 2020-06-18 20:01:02 +02:00
svlandeg
1951921230 implement split_sent with aligned SENT_START attribute 2020-06-18 19:41:53 +02:00
svlandeg
d1d6f16776 fix the fix 2020-06-18 19:15:32 +02:00
svlandeg
e822367cf7 prevent writing dummy values like deps because that could interfer with sent_start values 2020-06-18 17:47:59 +02:00
svlandeg
0b6d45eae1 various small fixes 2020-06-18 15:55:00 +02:00
svlandeg
1c71f2310c fix renames and simple_ner labels 2020-06-18 15:33:28 +02:00
svlandeg
64fc840a5d bugfix tok2vec 2020-06-18 15:24:40 +02:00
svlandeg
01f9ae774c small fixes 2020-06-18 14:01:19 +02:00
svlandeg
0c6f1f3891 fix BiluoPushDown parsing entities 2020-06-18 13:00:03 +02:00
svlandeg
cd790aaa2a fix parser tests to work with example (most still failing) 2020-06-18 11:19:22 +02:00
svlandeg
9f43ba839a throw informative error when running the components with the wrong type of objects 2020-06-18 10:36:05 +02:00
svlandeg
6712d0b5db textcat bugfix 2020-06-18 10:09:56 +02:00
svlandeg
40b2b21eef small bug fix 2020-06-17 23:33:51 +02:00
svlandeg
d6c4dd6eea pipe() takes docs, not examples 2020-06-17 21:29:36 +02:00
svlandeg
0f123af35e ensure test keeps working with non-linked entities 2020-06-17 21:13:38 +02:00
svlandeg
6d73e139b0 fix entity linker 2020-06-17 21:12:25 +02:00
svlandeg
be5934b827 fix tagger 2020-06-17 19:42:11 +02:00
svlandeg
10d396977e add support for MORPH in to/from_array, fix morphologizer overfitting test 2020-06-17 17:48:07 +02:00
svlandeg
1a151b10d6 correct silly typo 2020-06-17 14:48:14 +02:00
svlandeg
f6c451b650 cleanup 2020-06-17 14:45:54 +02:00
svlandeg
2d9f406188 fix test_cli 2020-06-17 14:42:48 +02:00
svlandeg
f7ad8e8c83 various fixes in scripts - needs to be further tested 2020-06-17 12:05:58 +02:00
svlandeg
3c4f9e4cc4 fix augment (needs further testing) 2020-06-17 10:46:29 +02:00
svlandeg
4ed399c848 minibatch utiltiy can deal with strings, docs or examples 2020-06-16 21:35:55 +02:00
svlandeg
8b66c11ff2 add spaces to json output format 2020-06-16 19:30:03 +02:00
svlandeg
ba80ad7efd fixed some tests + WIP roundtrip unit test 2020-06-16 18:26:50 +02:00
Ines Montani
e9d3e177f0 Merge branch 'master' into v2.3.x 2020-06-16 16:31:38 +02:00
svlandeg
43d41d6bb6 allow None as BILUO annotation 2020-06-16 15:30:05 +02:00
svlandeg
44a0f9c2c8 test_gold_biluo_different_tokenization works 2020-06-16 15:21:20 +02:00
svlandeg
1c35b8efcd fix spaces 2020-06-16 12:08:25 +02:00
svlandeg
6fea5fa4bd attempt to fix cases with weird spaces 2020-06-16 11:52:29 +02:00
svlandeg
0702a1d3fb fix test for misaligned 2020-06-15 23:10:47 +02:00
svlandeg
a28f8f369e Fix many-to-one IOB codes 2020-06-15 23:06:22 +02:00
svlandeg
12886b787b fixing NER one-to-many alignment 2020-06-15 22:44:17 +02:00
Matthew Honnibal
7ff447c5a0 Set version to v2.3.0 2020-06-15 18:22:25 +02:00
Matthew Honnibal
a0bf73a5dd Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-15 18:16:01 +02:00
Matthew Honnibal
c66f93299e Remove TokenAnnotation code from nonproj 2020-06-15 18:14:47 +02:00
Matthew Honnibal
c95494739c Fix import 2020-06-15 18:11:10 +02:00
Matthew Honnibal
8f978f2031 Fix import 2020-06-15 18:10:47 +02:00
Matthew Honnibal
95de7efaad Draft create_gold_state for arc_eager oracle 2020-06-15 18:10:19 +02:00
svlandeg
68986a252e additional tests for new get_aligned function 2020-06-15 17:42:40 +02:00
svlandeg
41d29983a7 start testing get_aligned 2020-06-15 17:16:01 +02:00
svlandeg
fd5f199feb fixing language and scoring tests 2020-06-15 15:02:05 +02:00
Adriane Boyd
0d8405aafa Updates to docstrings (#5589) 2020-06-15 14:58:36 +02:00
Adriane Boyd
e867e9fa8f Fix and add warnings related to spacy-lookups-data (#5588)
* Fix warning message for lemmatization tables

* Add a warning when the `lexeme_norm` table is empty. (Given the
relatively lang-specific loading for `Lookups`, it seemed like too much
overhead to dynamically extract the list of languages, so for now it's
hard-coded.)
2020-06-15 14:58:29 +02:00
Arvind Srinivasan
f698007907 Added Tamil Example Sentences (#5583)
* Added Examples for Tamil Sentences

#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107 

#### Type of Change
This is an enhancement.

* Accepting spaCy Contributor Agreement

* Signed on my behalf as an individual
2020-06-15 14:58:21 +02:00
Adriane Boyd
c94f7d0e75
Updates to docstrings (#5589) 2020-06-15 14:56:51 +02:00
Adriane Boyd
c482f20778
Fix and add warnings related to spacy-lookups-data (#5588)
* Fix warning message for lemmatization tables

* Add a warning when the `lexeme_norm` table is empty. (Given the
relatively lang-specific loading for `Lookups`, it seemed like too much
overhead to dynamically extract the list of languages, so for now it's
hard-coded.)
2020-06-15 14:56:04 +02:00
svlandeg
b4d914ec77 fix error catching 2020-06-15 12:56:32 +02:00
svlandeg
b9c9cbb2cd informative error when calling to_array with wrong field 2020-06-15 11:53:31 +02:00
svlandeg
ff231e1cdd fix merge conflict 2020-06-15 09:04:19 +02:00
svlandeg
a48553c1ed fix error numbers 2020-06-15 08:51:31 +02:00
Matthew Honnibal
3c0fc10dc4 Remove beam for now (maybe)
Remove beam_utils

Update setup.py

Remove beam
2020-06-14 19:53:29 +02:00