spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-03-12 07:15:48 +03:00

Author	SHA1	Message	Date
svlandeg	4ed399c848	minibatch utiltiy can deal with strings, docs or examples	2020-06-16 21:35:55 +02:00
svlandeg	8b66c11ff2	add spaces to json output format	2020-06-16 19:30:03 +02:00
svlandeg	ba80ad7efd	fixed some tests + WIP roundtrip unit test	2020-06-16 18:26:50 +02:00
svlandeg	43d41d6bb6	allow None as BILUO annotation	2020-06-16 15:30:05 +02:00
svlandeg	44a0f9c2c8	test_gold_biluo_different_tokenization works	2020-06-16 15:21:20 +02:00
svlandeg	1c35b8efcd	fix spaces	2020-06-16 12:08:25 +02:00
svlandeg	0702a1d3fb	fix test for misaligned	2020-06-15 23:10:47 +02:00
svlandeg	a28f8f369e	Fix many-to-one IOB codes	2020-06-15 23:06:22 +02:00
svlandeg	12886b787b	fixing NER one-to-many alignment	2020-06-15 22:44:17 +02:00
svlandeg	68986a252e	additional tests for new get_aligned function	2020-06-15 17:42:40 +02:00
svlandeg	41d29983a7	start testing get_aligned	2020-06-15 17:16:01 +02:00
svlandeg	fd5f199feb	fixing language and scoring tests	2020-06-15 15:02:05 +02:00
Matthew Honnibal	98ca14f577	Remove GoldParse WIP on removing goldparse Get ArcEager compiling after GoldParse excise Update setup.py Get spacy.syntax compiling after removing GoldParse Rename NewExample -> Example and clean up Clean html files Start updating tests Update Morphologizer	2020-06-14 19:53:30 +02:00
Matthew Honnibal	706e652820	Merge from develop	2020-06-14 17:35:01 +02:00
Matthew Honnibal	7de997c0a5	Update test	2020-06-13 23:11:45 +02:00
Matthew Honnibal	3eb8f3867e	Update test	2020-06-13 23:05:16 +02:00
svlandeg	face0de74f	fix MORPH conversion + enable unit test	2020-06-12 16:29:09 +02:00
svlandeg	880dccf93e	entities on doc_annotation, parse links and check their offsets against the entities. unit test works	2020-06-12 15:47:20 +02:00
svlandeg	3aed177a35	fix ENT_IOB conversion and enable unit test	2020-06-12 11:30:24 +02:00
Sofie Van Landeghem	c0f4a1e43b	train is from-config by default (#5575 ) * verbose and tag_map options * adding init_tok2vec option and only changing the tok2vec that is specified * adding omit_extra_lookups and verifying textcat config * wip * pretrain bugfix * add replace and resume options * train_textcat fix * raw text functionality * improve UX when KeyError or when input data can't be parsed * avoid unnecessary access to goldparse in TextCat pipe * save performance information in nlp.meta * add noise_level to config * move nn_parser's defaults to config file * multitask in config - doesn't work yet * scorer offering both F and AUC options, need to be specified in config * add textcat verification code from old train script * small fixes to config files * clean up * set default config for ner/parser to allow create_pipe to work as before * two more test fixes * small fixes * cleanup * fix NER pickling + additional unit test * create_pipe as before	2020-06-12 02:02:07 +02:00
svlandeg	6a67a11682	adding tests for new example class (some still failing - WIP)	2020-06-11 17:43:40 +02:00
Matthew Honnibal	488727aee0	Start updating test	2020-06-09 23:58:28 +02:00
Matthew Honnibal	ccd332a9fc	Update test stubs	2020-06-09 15:49:04 +02:00
Matthew Honnibal	f1189dc205	Draft tests for new Example class	2020-06-09 15:43:08 +02:00
Matthew Honnibal	c833ebe1ad	Start tests for new example class	2020-06-09 15:29:05 +02:00
Matthew Honnibal	d9289712ba	* Make GoldCorpus return dict, not Example * Make Example require a Doc object (previously optional) Clarify methods in GoldCorpus WIP refactor Example Refactor Example.split_sents Fix test Fix augment Update test Update test Fix import Update test_scorer Update Example	2020-06-09 01:01:59 +02:00
Matthew Honnibal	084271c9e9	Remove GoldParse from public API * Move get_parses_from_example to spacy.syntax * Get GoldParse out of Example * Avoid expecting GoldParse input in parser * Add Alignment to spacy.gold.align * Update Example object * Add comment * Update pipeline * Fix imports * Simplify gold_io * WIP on GoldCorpus * Update test * Xfail some gold tests * Remove ignore_misaligned option from GoldCorpus * Fix Example constructor * Update test * Fix usage of Example * Add deprecated_get_gold method on Example * Patch scorer * Fix test * Fix test * Update tests * Xfail a test * Fix passing of make_projective * Pass make_projective by default * Hack data format in Example.from_dict * Update tests * Fix example.from_dict * Update morphologizer * Fix entity linker * Add get_field to TokenAnnotation * Fix Example.get_aligned * Update test * Fix alignment * Fix corpus * Fix GoldCorpus * Handle misaligned * Format * Fix missing import	2020-06-08 22:09:57 +02:00
Ines Montani	d93cbeb14f	Add warning for loose version constraints (#5536 ) * Add warning for loose version constraints * Update wording [ci skip] * Tweak error message Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-06-05 12:42:15 +02:00
Matthew Honnibal	8411d4f4e6	Merge pull request #5543 from svlandeg/feature/pretrain-config pretrain from config	2020-06-04 19:07:12 +02:00
Ines Montani	810fce3bb1	Merge branch 'develop' into master-tmp	2020-06-03 14:36:59 +02:00
svlandeg	eac12cbb77	make dropout in embed layers configurable	2020-06-03 11:50:16 +02:00
svlandeg	6504b7f161	Merge remote-tracking branch 'upstream/develop' into feature/pretrain-config	2020-06-03 08:30:16 +02:00
svlandeg	c5ac382f0a	fix name clash	2020-06-02 22:24:57 +02:00
svlandeg	2bf5111ecf	additional test with discard_oversize=False	2020-06-02 22:09:37 +02:00
svlandeg	aa6271b16c	extending algorithm to deal better with edge cases	2020-06-02 22:05:08 +02:00
svlandeg	6208d322d3	slightly more challenging unit test	2020-06-02 19:47:30 +02:00
svlandeg	6651fafd5c	using overflow buffer for examples within the tolerance margin	2020-06-02 19:43:39 +02:00
svlandeg	85b0597ed5	add test for minibatch util	2020-06-02 18:26:21 +02:00
svlandeg	e0f9f448f1	remove Tensorizer	2020-06-01 23:38:48 +02:00
Ines Montani	b7aff6020c	Make functions more general purpose and update docstrings and tests	2020-05-30 15:18:53 +02:00
Ines Montani	e47e5a4b10	Use more sophisticated version parsing logic	2020-05-30 15:01:58 +02:00
Ines Montani	387c7aba15	Update test	2020-05-24 14:55:16 +02:00
Ines Montani	4465cad6c5	Rename spacy.analysis to spacy.pipe_analysis	2020-05-22 17:42:06 +02:00
Ines Montani	25d6ed3fb8	Merge pull request #5489 from explosion/feature/connected-components	2020-05-22 17:40:11 +02:00
Ines Montani	841c05b47b	Merge pull request #5490 from explosion/fix/remove-jsonschema	2020-05-22 17:39:54 +02:00
Ines Montani	569a65b60e	Auto-format	2020-05-22 16:55:42 +02:00
Ines Montani	d844528c5f	Add test for is_compatible_model	2020-05-22 16:55:15 +02:00
Ines Montani	12b7be1d98	Remove jsonschema from dependencies	2020-05-22 16:49:26 +02:00
Matthew Honnibal	f7f6df7275	Move to spacy.analysis	2020-05-22 16:43:18 +02:00
Matthew Honnibal	78d79d94ce	Guess set_annotations=True in nlp.update During `nlp.update`, components can be passed a boolean set_annotations to indicate whether they should assign annotations to the `Doc`. This needs to be called if downstream components expect to use the annotations during training, e.g. if we wanted to use tagger features in the parser. Components can specify their assignments and requirements, so we can figure out which components have these inter-dependencies. After figuring this out, we can guess whether to pass set_annotations=True. We could also call set_annotations=True always, or even just have this as the only behaviour. The downside of this is that it would require the `Doc` objects to be created afresh to avoid problematic modifications. One approach would be to make a fresh copy of the `Doc` objects within `nlp.update()`, so that we can write to the objects without any problems. If we do that, we can drop this logic and also drop the `set_annotations` mechanism. I would be fine with that approach, although it runs the risk of introducing some performance overhead, and we'll have to take care to copy all extension attributes etc.	2020-05-22 15:55:45 +02:00

1 2 3 4 5 ...

1668 Commits