Matthew Honnibal
2efe01bf26
Fix parser declaration
2020-06-22 00:54:38 +02:00
Matthew Honnibal
29d39d8a34
Update header
2020-06-22 00:54:38 +02:00
Matthew Honnibal
456e27dc8b
Start debugging arc_eager oracle
2020-06-22 00:54:38 +02:00
Matthew Honnibal
b60eede321
Fix parser model
2020-06-22 00:54:38 +02:00
Matthew Honnibal
17efd6bfec
Update train.py
2020-06-22 00:54:38 +02:00
Matthew Honnibal
49145b9ec1
Update DocBin
...
Add missing strings when serializing
2020-06-22 00:54:35 +02:00
Matthew Honnibal
17226a60ac
Draft Corpus class for DocBin
...
Update Corpus
Fix Corpus
2020-06-22 00:51:22 +02:00
Matthew Honnibal
6e7a7ab6da
Work on train script
2020-06-22 00:48:09 +02:00
Matthew Honnibal
a5ebfb20f5
Serialize all attrs by default
...
Move converters under spacy.gold
Move things around
Fix naming
Fix name
Update converter to produce DocBin
Update converters
Make spacy convert output docbin
Fix import
Fix docbin
Fix import
Update converter
Remove jsonl converter
Add json2docs converter
2020-06-22 00:46:08 +02:00
Matthew Honnibal
5467cb4aae
Allow DocBin to take list of Doc objects.
2020-06-22 00:46:08 +02:00
Matthew Honnibal
d422f30a18
Start updating converters
2020-06-22 00:46:12 +02:00
svlandeg
6d5bfd6f6a
fix test checking for variants
2020-06-22 00:46:08 +02:00
svlandeg
a427ca9355
clean up
2020-06-22 00:46:08 +02:00
svlandeg
5477bf054f
add links to to_dict
2020-06-22 00:46:08 +02:00
Matthew Honnibal
39117de4f9
Fix compile in ArcEager
2020-06-22 00:46:08 +02:00
Matthew Honnibal
e2279eab1c
Make doc.from_array several times faster
2020-06-22 00:46:08 +02:00
Matthew Honnibal
de32515bf8
Allocate Doc before starting to add words
2020-06-22 00:46:08 +02:00
Ines Montani
79dd824906
Tidy up
2020-06-22 00:45:40 +02:00
Ines Montani
1e5b4d8524
Fix DVC check
2020-06-22 00:30:05 +02:00
Ines Montani
5ba1df5e78
Update project CLI
2020-06-22 00:15:06 +02:00
Ines Montani
ef5f548fb0
Tidy up and auto-format
2020-06-21 22:38:04 +02:00
Ines Montani
f77e0bc028
Merge branch 'develop' into master-tmp
2020-06-21 22:34:15 +02:00
Ines Montani
40bb918a4c
Remove unicode declarations and tidy up
2020-06-21 22:34:10 +02:00
Ines Montani
275bab62df
Refactor CLI
2020-06-21 21:35:01 +02:00
Ines Montani
c12713a8be
Port CLI to Typer and add project stubs
2020-06-21 13:44:00 +02:00
Matthew Honnibal
6670c44390
Unskip tests
2020-06-21 01:17:52 +02:00
Matthew Honnibal
90d9f04e0b
Unskip
2020-06-21 01:16:33 +02:00
Matthew Honnibal
2b180ea033
Update test
2020-06-21 01:15:41 +02:00
Matthew Honnibal
192b94f0a1
Remove beam test
2020-06-21 01:15:12 +02:00
Matthew Honnibal
9db66ddd48
Update test_arc_eager_oracle
2020-06-21 01:12:28 +02:00
Matthew Honnibal
7544c21f5b
Update transition system
2020-06-21 01:12:05 +02:00
Matthew Honnibal
318a046fb0
Restore ArcEager.get_cost function
2020-06-21 01:11:08 +02:00
Matthew Honnibal
e90341810c
Update arc_eager oracle
2020-06-21 01:04:02 +02:00
Matthew Honnibal
c58deb3546
Work on parser oracle
2020-06-21 01:01:09 +02:00
svlandeg
689600e17d
add additional test back in (it works now)
2020-06-20 23:23:57 +02:00
svlandeg
2f6062a8a4
add line that got removed from EntityLinker
2020-06-20 23:14:45 +02:00
svlandeg
12dc8ab208
remove redundant code from master in EntityLinker
2020-06-20 23:07:42 +02:00
svlandeg
6179774278
fix test_build_dependencies by ignoring new libs
2020-06-20 22:49:37 +02:00
svlandeg
256d4c27c8
fix tagger begin_training being called without examples
2020-06-20 22:38:00 +02:00
Matthew Honnibal
914924a68b
Fix mimport
2020-06-20 22:22:40 +02:00
Matthew Honnibal
2791c1c0dc
Fix module name of corpus
2020-06-20 22:22:14 +02:00
Matthew Honnibal
4bbc277758
Update after removing GoldCorpus
2020-06-20 22:21:24 +02:00
Matthew Honnibal
64d00520e2
Update imports
2020-06-20 22:21:08 +02:00
Matthew Honnibal
cfd024536d
Remove GoldCorpus
2020-06-20 22:13:37 +02:00
Matthew Honnibal
fd83551eb5
Skip test causing segfault
2020-06-20 22:11:27 +02:00
svlandeg
5cb812e0ab
fix NER warn empty lookups (cf PR #5588 )
2020-06-20 22:04:18 +02:00
Matthew Honnibal
095710e40e
Skip tests that cause crashes
2020-06-20 22:02:32 +02:00
Matthew Honnibal
0b23fd3891
Xfail some tests
2020-06-20 21:52:57 +02:00
Matthew Honnibal
6af99f2f2d
Fix parser declaration
2020-06-20 21:50:17 +02:00
Matthew Honnibal
52edb24f07
Update header
2020-06-20 21:50:06 +02:00
Matthew Honnibal
0c10831b14
Start debugging arc_eager oracle
2020-06-20 21:49:46 +02:00
Matthew Honnibal
2bcb5881d7
Fix parser model
2020-06-20 21:49:31 +02:00
Matthew Honnibal
396dd60b3a
Fix Corpus
2020-06-20 21:49:15 +02:00
Matthew Honnibal
450c6fe39c
Update train.py
2020-06-20 21:49:06 +02:00
svlandeg
c9242e9bf4
fix entity linker (cf PR #5548 )
2020-06-20 21:47:23 +02:00
svlandeg
dc069e90b3
fix token.morph_ for v.3 (cf PR #5517 )
2020-06-20 21:13:11 +02:00
Matthew Honnibal
6d821b2e55
Make doc.from_array several times faster
2020-06-20 20:17:13 +02:00
Matthew Honnibal
fa86aa581d
Allocate Doc before starting to add words
2020-06-20 20:15:21 +02:00
Matthew Honnibal
652f31d3ee
Update DocBin
2020-06-20 20:12:54 +02:00
Matthew Honnibal
0a8b6631a2
Update Corpus
2020-06-20 20:12:31 +02:00
Matthew Honnibal
11fa0658f7
Work on train script
2020-06-20 20:12:19 +02:00
Ines Montani
988d2a4eda
Add --code-path option to train CLI ( #5618 )
2020-06-20 18:43:12 +02:00
Matthew Honnibal
0de361cd00
Draft Corpus class for DocBin
2020-06-20 18:31:07 +02:00
Ines Montani
5424b70e51
Remove v2 test
2020-06-20 16:18:53 +02:00
Ines Montani
63c22969f4
Update test_issue5230.py
2020-06-20 16:17:48 +02:00
Ines Montani
296b5d633b
Remove references to Python 2 / is_python2
2020-06-20 16:11:13 +02:00
Matthew Honnibal
7360d3db72
Add json2docs converter
2020-06-20 16:02:53 +02:00
Ines Montani
0cdb631e6c
Fix merge errors
2020-06-20 16:02:42 +02:00
Matthew Honnibal
f1756a6a22
Remove jsonl converter
2020-06-20 16:02:40 +02:00
Matthew Honnibal
5d89b1840e
Update converter
2020-06-20 16:00:14 +02:00
Matthew Honnibal
f5780cb160
Serialize all attrs by default
2020-06-20 15:59:39 +02:00
Matthew Honnibal
3241acbe0b
Fix import
2020-06-20 15:56:28 +02:00
Matthew Honnibal
b7a366b435
Fix compile in ArcEager
2020-06-20 15:56:16 +02:00
Matthew Honnibal
91fa2f1126
Fix docbin
2020-06-20 15:56:05 +02:00
Matthew Honnibal
476bcd4c53
Fix import
2020-06-20 15:55:57 +02:00
Matthew Honnibal
7a846921a3
Make spacy convert output docbin
2020-06-20 15:55:35 +02:00
Ines Montani
52728d8fa3
Merge branch 'develop' into master-tmp
2020-06-20 15:52:00 +02:00
Ines Montani
f91e9e8c84
Remove F841 [ci skip]
2020-06-20 14:47:17 +02:00
Ines Montani
8283df80e9
Tidy up and auto-format
2020-06-20 14:15:04 +02:00
Matthew Honnibal
0d22c6e006
Allow DocBin to take list of Doc objects.
2020-06-20 03:50:36 +02:00
Matthew Honnibal
95df028758
Update converters
2020-06-20 03:50:23 +02:00
Matthew Honnibal
3a73d95dcc
Update converter to produce DocBin
2020-06-20 03:50:13 +02:00
Matthew Honnibal
d9a8fdf4b7
Fix name
2020-06-20 03:26:36 +02:00
Matthew Honnibal
e20a780867
Fix naming
2020-06-20 03:24:49 +02:00
Matthew Honnibal
f61d5e3ac3
Move things around
2020-06-20 03:23:58 +02:00
Matthew Honnibal
c630cfdb5e
Move converters under spacy.gold
2020-06-20 03:20:34 +02:00
Matthew Honnibal
161d8439fa
Start updating converters
2020-06-20 03:19:40 +02:00
Matthew Honnibal
a79f0598a6
Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow
2020-06-20 02:36:40 +02:00
Matthew Honnibal
be81577719
Fix oracles
2020-06-20 02:36:12 +02:00
Marat M. Yavrumyan
8120b641cc
Update lex_attrs.py ( #5608 )
2020-06-19 20:00:34 +02:00
svlandeg
e30ec9b2a8
fix test checking for variants
2020-06-19 14:05:35 +02:00
svlandeg
25b0674320
clean up
2020-06-19 11:31:01 +02:00
svlandeg
c705a28438
add links to to_dict
2020-06-19 11:22:24 +02:00
Matthew Honnibal
03db143cd0
Draft new GoldCorpus class
2020-06-19 04:15:02 +02:00
Matthew Honnibal
a389866df6
Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow
2020-06-19 02:30:27 +02:00
Matthew Honnibal
bd29b7b14f
Update parser and NER gold stuff
2020-06-19 02:29:16 +02:00
Matthew Honnibal
5ae9e3480d
Return ArcEagerGoldParse from ArcEager
2020-06-19 00:11:59 +02:00
svlandeg
6ca6d7d6b4
test for split sentences with various alignment issues, works
2020-06-18 20:01:02 +02:00
svlandeg
1951921230
implement split_sent with aligned SENT_START attribute
2020-06-18 19:41:53 +02:00
svlandeg
d1d6f16776
fix the fix
2020-06-18 19:15:32 +02:00
svlandeg
e822367cf7
prevent writing dummy values like deps because that could interfer with sent_start values
2020-06-18 17:47:59 +02:00
svlandeg
0b6d45eae1
various small fixes
2020-06-18 15:55:00 +02:00
svlandeg
1c71f2310c
fix renames and simple_ner labels
2020-06-18 15:33:28 +02:00
svlandeg
64fc840a5d
bugfix tok2vec
2020-06-18 15:24:40 +02:00
svlandeg
01f9ae774c
small fixes
2020-06-18 14:01:19 +02:00
svlandeg
0c6f1f3891
fix BiluoPushDown parsing entities
2020-06-18 13:00:03 +02:00
svlandeg
cd790aaa2a
fix parser tests to work with example (most still failing)
2020-06-18 11:19:22 +02:00
svlandeg
9f43ba839a
throw informative error when running the components with the wrong type of objects
2020-06-18 10:36:05 +02:00
svlandeg
6712d0b5db
textcat bugfix
2020-06-18 10:09:56 +02:00
svlandeg
40b2b21eef
small bug fix
2020-06-17 23:33:51 +02:00
svlandeg
d6c4dd6eea
pipe() takes docs, not examples
2020-06-17 21:29:36 +02:00
svlandeg
0f123af35e
ensure test keeps working with non-linked entities
2020-06-17 21:13:38 +02:00
svlandeg
6d73e139b0
fix entity linker
2020-06-17 21:12:25 +02:00
svlandeg
be5934b827
fix tagger
2020-06-17 19:42:11 +02:00
svlandeg
10d396977e
add support for MORPH in to/from_array, fix morphologizer overfitting test
2020-06-17 17:48:07 +02:00
svlandeg
1a151b10d6
correct silly typo
2020-06-17 14:48:14 +02:00
svlandeg
f6c451b650
cleanup
2020-06-17 14:45:54 +02:00
svlandeg
2d9f406188
fix test_cli
2020-06-17 14:42:48 +02:00
svlandeg
f7ad8e8c83
various fixes in scripts - needs to be further tested
2020-06-17 12:05:58 +02:00
svlandeg
3c4f9e4cc4
fix augment (needs further testing)
2020-06-17 10:46:29 +02:00
svlandeg
4ed399c848
minibatch utiltiy can deal with strings, docs or examples
2020-06-16 21:35:55 +02:00
svlandeg
8b66c11ff2
add spaces to json output format
2020-06-16 19:30:03 +02:00
svlandeg
ba80ad7efd
fixed some tests + WIP roundtrip unit test
2020-06-16 18:26:50 +02:00
Ines Montani
e9d3e177f0
Merge branch 'master' into v2.3.x
2020-06-16 16:31:38 +02:00
svlandeg
43d41d6bb6
allow None as BILUO annotation
2020-06-16 15:30:05 +02:00
svlandeg
44a0f9c2c8
test_gold_biluo_different_tokenization works
2020-06-16 15:21:20 +02:00
svlandeg
1c35b8efcd
fix spaces
2020-06-16 12:08:25 +02:00
svlandeg
6fea5fa4bd
attempt to fix cases with weird spaces
2020-06-16 11:52:29 +02:00
svlandeg
0702a1d3fb
fix test for misaligned
2020-06-15 23:10:47 +02:00
svlandeg
a28f8f369e
Fix many-to-one IOB codes
2020-06-15 23:06:22 +02:00
svlandeg
12886b787b
fixing NER one-to-many alignment
2020-06-15 22:44:17 +02:00
Matthew Honnibal
7ff447c5a0
Set version to v2.3.0
2020-06-15 18:22:25 +02:00
Matthew Honnibal
a0bf73a5dd
Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow
2020-06-15 18:16:01 +02:00
Matthew Honnibal
c66f93299e
Remove TokenAnnotation code from nonproj
2020-06-15 18:14:47 +02:00
Matthew Honnibal
c95494739c
Fix import
2020-06-15 18:11:10 +02:00
Matthew Honnibal
8f978f2031
Fix import
2020-06-15 18:10:47 +02:00
Matthew Honnibal
95de7efaad
Draft create_gold_state for arc_eager oracle
2020-06-15 18:10:19 +02:00
svlandeg
68986a252e
additional tests for new get_aligned function
2020-06-15 17:42:40 +02:00
svlandeg
41d29983a7
start testing get_aligned
2020-06-15 17:16:01 +02:00
svlandeg
fd5f199feb
fixing language and scoring tests
2020-06-15 15:02:05 +02:00
Adriane Boyd
0d8405aafa
Updates to docstrings ( #5589 )
2020-06-15 14:58:36 +02:00
Adriane Boyd
e867e9fa8f
Fix and add warnings related to spacy-lookups-data ( #5588 )
...
* Fix warning message for lemmatization tables
* Add a warning when the `lexeme_norm` table is empty. (Given the
relatively lang-specific loading for `Lookups`, it seemed like too much
overhead to dynamically extract the list of languages, so for now it's
hard-coded.)
2020-06-15 14:58:29 +02:00
Arvind Srinivasan
f698007907
Added Tamil Example Sentences ( #5583 )
...
* Added Examples for Tamil Sentences
#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107
#### Type of Change
This is an enhancement.
* Accepting spaCy Contributor Agreement
* Signed on my behalf as an individual
2020-06-15 14:58:21 +02:00
Adriane Boyd
c94f7d0e75
Updates to docstrings ( #5589 )
2020-06-15 14:56:51 +02:00
Adriane Boyd
c482f20778
Fix and add warnings related to spacy-lookups-data ( #5588 )
...
* Fix warning message for lemmatization tables
* Add a warning when the `lexeme_norm` table is empty. (Given the
relatively lang-specific loading for `Lookups`, it seemed like too much
overhead to dynamically extract the list of languages, so for now it's
hard-coded.)
2020-06-15 14:56:04 +02:00
svlandeg
b4d914ec77
fix error catching
2020-06-15 12:56:32 +02:00
svlandeg
b9c9cbb2cd
informative error when calling to_array with wrong field
2020-06-15 11:53:31 +02:00
svlandeg
ff231e1cdd
fix merge conflict
2020-06-15 09:04:19 +02:00
svlandeg
a48553c1ed
fix error numbers
2020-06-15 08:51:31 +02:00
Matthew Honnibal
3c0fc10dc4
Remove beam for now (maybe)
...
Remove beam_utils
Update setup.py
Remove beam
2020-06-14 19:53:29 +02:00