Commit Graph

7275 Commits

Author SHA1 Message Date
Matthew Honnibal
2efe01bf26 Fix parser declaration 2020-06-22 00:54:38 +02:00
Matthew Honnibal
29d39d8a34 Update header 2020-06-22 00:54:38 +02:00
Matthew Honnibal
456e27dc8b Start debugging arc_eager oracle 2020-06-22 00:54:38 +02:00
Matthew Honnibal
b60eede321 Fix parser model 2020-06-22 00:54:38 +02:00
Matthew Honnibal
17efd6bfec Update train.py 2020-06-22 00:54:38 +02:00
Matthew Honnibal
49145b9ec1 Update DocBin
Add missing strings when serializing
2020-06-22 00:54:35 +02:00
Matthew Honnibal
17226a60ac Draft Corpus class for DocBin
Update Corpus

Fix Corpus
2020-06-22 00:51:22 +02:00
Matthew Honnibal
6e7a7ab6da Work on train script 2020-06-22 00:48:09 +02:00
Matthew Honnibal
a5ebfb20f5 Serialize all attrs by default
Move converters under spacy.gold

Move things around

Fix naming

Fix name

Update converter to produce DocBin

Update converters

Make spacy convert output docbin

Fix import

Fix docbin

Fix import

Update converter

Remove jsonl converter

Add json2docs converter
2020-06-22 00:46:08 +02:00
Matthew Honnibal
5467cb4aae Allow DocBin to take list of Doc objects. 2020-06-22 00:46:08 +02:00
Matthew Honnibal
d422f30a18 Start updating converters 2020-06-22 00:46:12 +02:00
svlandeg
6d5bfd6f6a fix test checking for variants 2020-06-22 00:46:08 +02:00
svlandeg
a427ca9355 clean up 2020-06-22 00:46:08 +02:00
svlandeg
5477bf054f add links to to_dict 2020-06-22 00:46:08 +02:00
Matthew Honnibal
39117de4f9 Fix compile in ArcEager 2020-06-22 00:46:08 +02:00
Matthew Honnibal
e2279eab1c Make doc.from_array several times faster 2020-06-22 00:46:08 +02:00
Matthew Honnibal
de32515bf8 Allocate Doc before starting to add words 2020-06-22 00:46:08 +02:00
Ines Montani
79dd824906 Tidy up 2020-06-22 00:45:40 +02:00
Ines Montani
1e5b4d8524 Fix DVC check 2020-06-22 00:30:05 +02:00
Ines Montani
5ba1df5e78 Update project CLI 2020-06-22 00:15:06 +02:00
Ines Montani
ef5f548fb0 Tidy up and auto-format 2020-06-21 22:38:04 +02:00
Ines Montani
f77e0bc028 Merge branch 'develop' into master-tmp 2020-06-21 22:34:15 +02:00
Ines Montani
40bb918a4c Remove unicode declarations and tidy up 2020-06-21 22:34:10 +02:00
Ines Montani
275bab62df Refactor CLI 2020-06-21 21:35:01 +02:00
Ines Montani
c12713a8be Port CLI to Typer and add project stubs 2020-06-21 13:44:00 +02:00
Matthew Honnibal
6670c44390 Unskip tests 2020-06-21 01:17:52 +02:00
Matthew Honnibal
90d9f04e0b Unskip 2020-06-21 01:16:33 +02:00
Matthew Honnibal
2b180ea033 Update test 2020-06-21 01:15:41 +02:00
Matthew Honnibal
192b94f0a1 Remove beam test 2020-06-21 01:15:12 +02:00
Matthew Honnibal
9db66ddd48 Update test_arc_eager_oracle 2020-06-21 01:12:28 +02:00
Matthew Honnibal
7544c21f5b Update transition system 2020-06-21 01:12:05 +02:00
Matthew Honnibal
318a046fb0 Restore ArcEager.get_cost function 2020-06-21 01:11:08 +02:00
Matthew Honnibal
e90341810c Update arc_eager oracle 2020-06-21 01:04:02 +02:00
Matthew Honnibal
c58deb3546 Work on parser oracle 2020-06-21 01:01:09 +02:00
svlandeg
689600e17d add additional test back in (it works now) 2020-06-20 23:23:57 +02:00
svlandeg
2f6062a8a4 add line that got removed from EntityLinker 2020-06-20 23:14:45 +02:00
svlandeg
12dc8ab208 remove redundant code from master in EntityLinker 2020-06-20 23:07:42 +02:00
svlandeg
6179774278 fix test_build_dependencies by ignoring new libs 2020-06-20 22:49:37 +02:00
svlandeg
256d4c27c8 fix tagger begin_training being called without examples 2020-06-20 22:38:00 +02:00
Matthew Honnibal
914924a68b Fix mimport 2020-06-20 22:22:40 +02:00
Matthew Honnibal
2791c1c0dc Fix module name of corpus 2020-06-20 22:22:14 +02:00
Matthew Honnibal
4bbc277758 Update after removing GoldCorpus 2020-06-20 22:21:24 +02:00
Matthew Honnibal
64d00520e2 Update imports 2020-06-20 22:21:08 +02:00
Matthew Honnibal
cfd024536d Remove GoldCorpus 2020-06-20 22:13:37 +02:00
Matthew Honnibal
fd83551eb5 Skip test causing segfault 2020-06-20 22:11:27 +02:00
svlandeg
5cb812e0ab fix NER warn empty lookups (cf PR #5588) 2020-06-20 22:04:18 +02:00
Matthew Honnibal
095710e40e Skip tests that cause crashes 2020-06-20 22:02:32 +02:00
Matthew Honnibal
0b23fd3891 Xfail some tests 2020-06-20 21:52:57 +02:00
Matthew Honnibal
6af99f2f2d Fix parser declaration 2020-06-20 21:50:17 +02:00
Matthew Honnibal
52edb24f07 Update header 2020-06-20 21:50:06 +02:00
Matthew Honnibal
0c10831b14 Start debugging arc_eager oracle 2020-06-20 21:49:46 +02:00
Matthew Honnibal
2bcb5881d7 Fix parser model 2020-06-20 21:49:31 +02:00
Matthew Honnibal
396dd60b3a Fix Corpus 2020-06-20 21:49:15 +02:00
Matthew Honnibal
450c6fe39c Update train.py 2020-06-20 21:49:06 +02:00
svlandeg
c9242e9bf4 fix entity linker (cf PR #5548) 2020-06-20 21:47:23 +02:00
svlandeg
dc069e90b3 fix token.morph_ for v.3 (cf PR #5517) 2020-06-20 21:13:11 +02:00
Matthew Honnibal
6d821b2e55 Make doc.from_array several times faster 2020-06-20 20:17:13 +02:00
Matthew Honnibal
fa86aa581d Allocate Doc before starting to add words 2020-06-20 20:15:21 +02:00
Matthew Honnibal
652f31d3ee Update DocBin 2020-06-20 20:12:54 +02:00
Matthew Honnibal
0a8b6631a2 Update Corpus 2020-06-20 20:12:31 +02:00
Matthew Honnibal
11fa0658f7 Work on train script 2020-06-20 20:12:19 +02:00
Ines Montani
988d2a4eda
Add --code-path option to train CLI (#5618) 2020-06-20 18:43:12 +02:00
Matthew Honnibal
0de361cd00 Draft Corpus class for DocBin 2020-06-20 18:31:07 +02:00
Ines Montani
5424b70e51 Remove v2 test 2020-06-20 16:18:53 +02:00
Ines Montani
63c22969f4 Update test_issue5230.py 2020-06-20 16:17:48 +02:00
Ines Montani
296b5d633b Remove references to Python 2 / is_python2 2020-06-20 16:11:13 +02:00
Matthew Honnibal
7360d3db72 Add json2docs converter 2020-06-20 16:02:53 +02:00
Ines Montani
0cdb631e6c Fix merge errors 2020-06-20 16:02:42 +02:00
Matthew Honnibal
f1756a6a22 Remove jsonl converter 2020-06-20 16:02:40 +02:00
Matthew Honnibal
5d89b1840e Update converter 2020-06-20 16:00:14 +02:00
Matthew Honnibal
f5780cb160 Serialize all attrs by default 2020-06-20 15:59:39 +02:00
Matthew Honnibal
3241acbe0b Fix import 2020-06-20 15:56:28 +02:00
Matthew Honnibal
b7a366b435 Fix compile in ArcEager 2020-06-20 15:56:16 +02:00
Matthew Honnibal
91fa2f1126 Fix docbin 2020-06-20 15:56:05 +02:00
Matthew Honnibal
476bcd4c53 Fix import 2020-06-20 15:55:57 +02:00
Matthew Honnibal
7a846921a3 Make spacy convert output docbin 2020-06-20 15:55:35 +02:00
Ines Montani
52728d8fa3 Merge branch 'develop' into master-tmp 2020-06-20 15:52:00 +02:00
Ines Montani
f91e9e8c84 Remove F841 [ci skip] 2020-06-20 14:47:17 +02:00
Ines Montani
8283df80e9 Tidy up and auto-format 2020-06-20 14:15:04 +02:00
Matthew Honnibal
0d22c6e006 Allow DocBin to take list of Doc objects. 2020-06-20 03:50:36 +02:00
Matthew Honnibal
95df028758 Update converters 2020-06-20 03:50:23 +02:00
Matthew Honnibal
3a73d95dcc Update converter to produce DocBin 2020-06-20 03:50:13 +02:00
Matthew Honnibal
d9a8fdf4b7 Fix name 2020-06-20 03:26:36 +02:00
Matthew Honnibal
e20a780867 Fix naming 2020-06-20 03:24:49 +02:00
Matthew Honnibal
f61d5e3ac3 Move things around 2020-06-20 03:23:58 +02:00
Matthew Honnibal
c630cfdb5e Move converters under spacy.gold 2020-06-20 03:20:34 +02:00
Matthew Honnibal
161d8439fa Start updating converters 2020-06-20 03:19:40 +02:00
Matthew Honnibal
a79f0598a6 Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-20 02:36:40 +02:00
Matthew Honnibal
be81577719 Fix oracles 2020-06-20 02:36:12 +02:00
Marat M. Yavrumyan
8120b641cc
Update lex_attrs.py (#5608) 2020-06-19 20:00:34 +02:00
svlandeg
e30ec9b2a8 fix test checking for variants 2020-06-19 14:05:35 +02:00
svlandeg
25b0674320 clean up 2020-06-19 11:31:01 +02:00
svlandeg
c705a28438 add links to to_dict 2020-06-19 11:22:24 +02:00
Matthew Honnibal
03db143cd0 Draft new GoldCorpus class 2020-06-19 04:15:02 +02:00
Matthew Honnibal
a389866df6 Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-19 02:30:27 +02:00
Matthew Honnibal
bd29b7b14f Update parser and NER gold stuff 2020-06-19 02:29:16 +02:00
Matthew Honnibal
5ae9e3480d Return ArcEagerGoldParse from ArcEager 2020-06-19 00:11:59 +02:00
svlandeg
6ca6d7d6b4 test for split sentences with various alignment issues, works 2020-06-18 20:01:02 +02:00
svlandeg
1951921230 implement split_sent with aligned SENT_START attribute 2020-06-18 19:41:53 +02:00
svlandeg
d1d6f16776 fix the fix 2020-06-18 19:15:32 +02:00