spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-14 05:37:03 +03:00

Author	SHA1	Message	Date
Adriane Boyd	a7a7e0d2a6	Add morph to morphology in Doc.from_array (#5762 ) * Add morph to morphology in Doc.from_array Add morphological analyses to morphology table in `Doc.from_array`. * Use separate vocab in DocBin roundtrip test	2020-07-14 14:07:35 +02:00
Ines Montani	872938ec76	Merge pull request #5747 from explosion/feature/refactor-config-args	2020-07-14 00:00:22 +02:00
Sofie Van Landeghem	6f3bb6f77c	fix doc.to_utf8 on GPU (#5757 )	2020-07-13 23:05:33 +02:00
Ines Montani	ed55143c0d	Merge branch 'develop' into compat/remove-object-subclass	2020-07-12 14:28:52 +02:00
Ines Montani	7906ddd56c	Fix test	2020-07-12 14:28:34 +02:00
Ines Montani	5f6f4ff594	Remove object subclassing	2020-07-12 14:03:23 +02:00
Ines Montani	c96535e338	Update command docstrings and docs	2020-07-12 13:53:49 +02:00
Ines Montani	0ab483037c	Make debug commands subcommands of spacy debug Also handle backwards-compatibility so the old commands don't break	2020-07-12 13:53:41 +02:00
Ines Montani	8a67ddd6f1	Remove unused import	2020-07-12 12:32:24 +02:00
Ines Montani	d1d7fd5f5d	Don't use file paths in schemas It should be possible to validate top-level config with file paths that don't exist	2020-07-12 12:32:08 +02:00
Ines Montani	79346853aa	Add debug-config command	2020-07-12 12:31:17 +02:00
Ines Montani	3a8632c3fb	Hide command from public --help for now Not sure we want this to be officially documented yet?	2020-07-11 19:21:22 +02:00
Ines Montani	5e683d03fe	Allow extra args on pretrain and debug_data	2020-07-11 19:17:59 +02:00
Ines Montani	b7111da1d7	Update config and commands	2020-07-11 13:03:53 +02:00
Ines Montani	f99ce7fbfb	Make validation errors more elegant	2020-07-10 23:34:17 +02:00
Ines Montani	7b5717cac3	Merge branch 'develop' into feature/refactor-config-args	2020-07-10 22:50:07 +02:00
Matthew Honnibal	743f7fb73a	Set version to v3.0.0a4	2020-07-10 22:40:12 +02:00
Matthew Honnibal	b68216e263	Explicitly delete objects after parser.update to free GPU memory (#5748 ) * Try explicitly deleting objects * Refactor parser model backprop slightly * Free parser data explicitly after rehearse and update	2020-07-10 22:35:20 +02:00
Ines Montani	fb6f6f584e	Replace - with _ in command names We might as well be nice if user accidentally types --training.use-gpu	2020-07-10 22:34:22 +02:00
Ines Montani	bfa8e11ffa	Update and auto-format	2020-07-10 20:52:00 +02:00
Ines Montani	0389c34b81	Merge branch 'develop' into feature/refactor-config-args	2020-07-10 20:51:52 +02:00
Ines Montani	931250e1f5	Fix pipeline component schema	2020-07-10 20:32:53 +02:00
Ines Montani	9fe1fa88ad	Fix typo	2020-07-10 20:32:37 +02:00
Ines Montani	defe1e7213	Pretty-print config validation errors	2020-07-10 20:01:20 +02:00
Sofie Van Landeghem	de6a32315c	debug-model script (#5749 ) * adding debug-model to print the internals for debugging purposes * expend debug-model script with 4 stages: before, init, train, predict * avoid enforcing to have a seed in the train script * small fixes	2020-07-10 19:47:53 +02:00
Ines Montani	a3667394b4	Integrate with latest Thinc and config overrides	2020-07-10 19:47:05 +02:00
Ines Montani	5cfc3edcaa	Update CLI tests	2020-07-10 18:21:01 +02:00
Ines Montani	3583ea84d8	Update arg parsing	2020-07-10 18:20:52 +02:00
Ines Montani	73332ddb67	Update CLI commans to use one shared util file	2020-07-10 17:57:40 +02:00
Ines Montani	240e0a62ca	Update with WIP	2020-07-10 13:31:27 +02:00
Ines Montani	a60562f208	Update project CLI hashes, directories, skipping (#5741 ) * Update project CLI hashes, directories, skipping * Improve clone success message * Remove unused context args * Move project-specific utils to project utils The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util * Improve run help and add workflows * Add note re: directory checksum speed * Fix cloning from subdirectories and output messages * Remove hard-coded dirs	2020-07-09 23:51:18 +02:00
Matthew Honnibal	552d1ad226	Hack at tests	2020-07-09 20:25:51 +02:00
Matthew Honnibal	eb064c59cd	Try to fix textcat test	2020-07-09 20:24:53 +02:00
Ines Montani	018319a640	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-07-09 19:44:41 +02:00
Ines Montani	05e182e421	Update CLI args and docstrings	2020-07-09 19:44:28 +02:00
Sofie Van Landeghem	dd207a28be	cleanup components API (#5726 ) * add keyword separator for update functions and drop unused "state" * few more Example tests and various small fixes * consistently return losses after update call * eliminate unused tensors field across pipe components * fix name * fix arg name	2020-07-09 19:43:39 +02:00
Adriane Boyd	ac4297ee39	Minor refactor to conversion of output docs (#5718 ) Minor refactor of conversion of docs to output format to avoid duplicate conversion steps.	2020-07-09 19:42:32 +02:00
Sofie Van Landeghem	c1ea55307b	Fixing reproducible training (#5735 ) * Add initial reproducibility tests * failing test for default_text_classifier (WIP) * track trouble to underlying tok2vec layer * add regression test for Issue 5551 * tests go green with https://github.com/explosion/thinc/pull/359 * update test * adding fixed seeds to HashEmbed layers, seems to fix the reproducility issue Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-07-09 19:39:31 +02:00
Matthew Honnibal	1827f22f56	Set version to v3.0.0a3	2020-07-09 19:38:04 +02:00
Matthw Honnibal	7010f1a2be	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-07-09 19:34:11 +02:00
Matthw Honnibal	77af0a6bb4	Offer option of padding-sensitive batching	2020-07-09 14:50:20 +02:00
Matthw Honnibal	3a7f275c02	Add extra batch util	2020-07-09 14:38:41 +02:00
Matthw Honnibal	eb0798c421	Add __len__ method for Example	2020-07-09 14:38:26 +02:00
Ines Montani	8f9552d9e7	Refactor project CLI (#5732 ) * Make project command a submodule * Update with WIP * Add helper for joining commands * Update docstrins, formatting and types * Update assets and add support for copying local files * Fix type * Update success messages	2020-07-09 01:42:51 +02:00
Adriane Boyd	ad15499b3b	Fix get_loss for values outside of labels in senter (#5730 ) * Fix get_loss for None alignments in senter When converting the `sent_start` values back to `SentenceRecognizer` labels, handle `None` alignments. * Handle SENT_START as -1 Handle SENT_START as -1 (or -1 converted to uint64) by treating any values other than 1 the same as 0 in `SentenceRecognizer.get_loss`.	2020-07-09 01:41:58 +02:00
Matthw Honnibal	1b20ffac38	batch_by_words by default	2020-07-08 21:37:06 +02:00
Matthw Honnibal	93e50da46a	Remove auto 'set_annotation' in training to address GPU memory	2020-07-08 21:36:51 +02:00
Matthw Honnibal	fb8a5967c1	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-07-08 15:27:50 +02:00
Ines Montani	0a3d41bb1d	Deprecat model shortcuts and simplify download (#5722 )	2020-07-08 14:00:07 +02:00
Adriane Boyd	c9f0f75778	Update get_loss for senter and morphologizer (#5724 ) * Update get_loss for senter Update `SentenceRecognizer.get_loss` to keep it similar to `Tagger`. * Update get_loss for morphologizer Update `Morphologizer.get_loss` to keep it similar to `Tagger`.	2020-07-08 13:59:28 +02:00

1 2 3 4 5 ...

7242 Commits