Commit Graph

89 Commits

Author SHA1 Message Date
Matthew Honnibal
98c026195b Add assertion 2020-06-25 18:52:55 +02:00
Matthew Honnibal
6bda23ad26 Improve handling of missing values in NER 2020-06-25 16:26:44 +02:00
Matthew Honnibal
b8c85e593b Improve NER alignment 2020-06-25 16:26:13 +02:00
Matthew Honnibal
b3df6228dd Fix spaces reading 2020-06-25 15:20:00 +02:00
Matthew Honnibal
c39401105b Set spaces on gold doc after conversion 2020-06-25 15:19:36 +02:00
Matthew Honnibal
c2fd1e4eb9 Fix merge 2020-06-25 03:58:32 +02:00
Matthew Honnibal
8bbf31a582 Output unlabelled spans from O biluo tags in iob_utils 2020-06-24 18:03:44 +02:00
Matthew Honnibal
7eb064854e Fix handling of NER data in Example 2020-06-24 18:03:24 +02:00
Matthew Honnibal
78e9e15e9e Fix conversion of NER data 2020-06-23 23:58:27 +02:00
Matthew Honnibal
b82431207d Simplify NER alignment 2020-06-23 23:57:54 +02:00
Matthew Honnibal
a68d0e63f0 Support max_length in Corpus 2020-06-23 22:57:40 +02:00
svlandeg
28ad71c187 bugfix excl Span.end in iob2docs 2020-06-23 17:20:41 +02:00
svlandeg
351ab3a3d4 pull merge_sent into iob2docs to avoid Doc creation for each line 2020-06-23 16:47:30 +02:00
Matthew Honnibal
8722b65bce Fix json2docs converter 2020-06-23 13:19:26 +02:00
Matthew Honnibal
7376518af2 Fix gold_preproc 2020-06-23 12:01:29 +02:00
Matthew Honnibal
8f420f3978 Merge branch 'whatif/arrow' of https://github.com/explosion/spaCy into whatif/arrow 2020-06-22 17:49:16 +02:00
Matthew Honnibal
03b3da26be Support gold_preproc 2020-06-22 17:48:38 +02:00
Matthew Honnibal
2d34d2f24a Support gold_preproc in Corpus 2020-06-22 17:47:12 +02:00
svlandeg
54855e3f3a various small fixes 2020-06-22 17:33:19 +02:00
Matthew Honnibal
afe6ee4548 Fix Corpus 2020-06-22 16:28:47 +02:00
Matthew Honnibal
2de72b30fe Remove prints 2020-06-22 15:34:55 +02:00
Matthew Honnibal
b250f6b62f Update test 2020-06-22 14:59:05 +02:00
Matthew Honnibal
72ab21166d Work on Example.get_aligned_ner method 2020-06-22 14:55:33 +02:00
svlandeg
5e71919322 avoid writing temp dir in json2docs, fixing 4402 test 2020-06-22 14:27:35 +02:00
svlandeg
0b3985d307 limit arg for Corpus 2020-06-22 10:22:26 +02:00
svlandeg
0d64c435b0 small fixes 2020-06-22 10:05:12 +02:00
Matthew Honnibal
6a75992af6 Format 2020-06-22 01:11:43 +02:00
Matthew Honnibal
e634bae69e Fix Corpus 2020-06-22 00:54:38 +02:00
Matthew Honnibal
3354758351 Remove Example.doc property
Remove Example.doc

Remove Example.doc

Remove Example.doc

Remove Example.doc
2020-06-22 00:54:38 +02:00
Matthew Honnibal
7d329cd1ac Add kwargs to Corpus.dev_dataset to match train_dataset 2020-06-22 00:54:38 +02:00
Matthew Honnibal
59098a5f62 Add get_aligned_parse method in Example
Fix Example.get_aligned_parse
2020-06-22 00:54:38 +02:00
Matthew Honnibal
75a5f2d499 Remove GoldCorpus
Update imports

Update after removing GoldCorpus

Fix module name of corpus

Fix mimport
2020-06-22 00:54:38 +02:00
Matthew Honnibal
17226a60ac Draft Corpus class for DocBin
Update Corpus

Fix Corpus
2020-06-22 00:51:22 +02:00
Matthew Honnibal
a5ebfb20f5 Serialize all attrs by default
Move converters under spacy.gold

Move things around

Fix naming

Fix name

Update converter to produce DocBin

Update converters

Make spacy convert output docbin

Fix import

Fix docbin

Fix import

Update converter

Remove jsonl converter

Add json2docs converter
2020-06-22 00:46:08 +02:00
svlandeg
6d5bfd6f6a fix test checking for variants 2020-06-22 00:46:08 +02:00
svlandeg
5477bf054f add links to to_dict 2020-06-22 00:46:08 +02:00
Matthew Honnibal
03db143cd0 Draft new GoldCorpus class 2020-06-19 04:15:02 +02:00
svlandeg
1951921230 implement split_sent with aligned SENT_START attribute 2020-06-18 19:41:53 +02:00
svlandeg
d1d6f16776 fix the fix 2020-06-18 19:15:32 +02:00
svlandeg
e822367cf7 prevent writing dummy values like deps because that could interfer with sent_start values 2020-06-18 17:47:59 +02:00
svlandeg
0c6f1f3891 fix BiluoPushDown parsing entities 2020-06-18 13:00:03 +02:00
svlandeg
40b2b21eef small bug fix 2020-06-17 23:33:51 +02:00
svlandeg
6d73e139b0 fix entity linker 2020-06-17 21:12:25 +02:00
svlandeg
10d396977e add support for MORPH in to/from_array, fix morphologizer overfitting test 2020-06-17 17:48:07 +02:00
svlandeg
1a151b10d6 correct silly typo 2020-06-17 14:48:14 +02:00
svlandeg
2d9f406188 fix test_cli 2020-06-17 14:42:48 +02:00
svlandeg
f7ad8e8c83 various fixes in scripts - needs to be further tested 2020-06-17 12:05:58 +02:00
svlandeg
3c4f9e4cc4 fix augment (needs further testing) 2020-06-17 10:46:29 +02:00
svlandeg
4ed399c848 minibatch utiltiy can deal with strings, docs or examples 2020-06-16 21:35:55 +02:00
svlandeg
8b66c11ff2 add spaces to json output format 2020-06-16 19:30:03 +02:00