spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-18 16:56:07 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	5aff2b8204	Merge branch 'v4' into feature/multiple-code-files	2023-02-02 12:34:59 +09:00
Peter Baumgartner	c68e6b8a96	`trainable_lemmatizer` in `debug data` (#11419 ) * WIP * rm ipython embeds * rm total * WIP * cleanup * cleanup + reword * rm component function * remove migration support form * fix reference dataset for dev data * additional fixes - set approach to identifying unique trees - adjust line length on messages - add logic for detecting docs without annotations * use 0 instead of none for no annotation * partial annotation support * initial tests for _compile_gold lemma attributes Using the example data from the edit tree lemmatizer tests for: - lemmatizer_trees - partial_lemma_annotations - n_low_cardinality_lemmas - no_lemma_annotations * adds output test for cli app * switch msg level * rm unclear uniqueness check * Revert "rm unclear uniqueness check" This reverts commit `6ea2b3524b`. * remove good message on uniqueness * formatting * use en_vocab fixture * clarify data set source in messages * remove unnecessary import Co-authored-by: svlandeg <svlandeg@github.com>	2023-01-26 17:36:50 +01:00
Paul O'Leary McCann	0f78418c5c	Mark tests as slow	2023-01-26 14:28:10 +09:00
Paul O'Leary McCann	2f74158b32	Add evaluate test and some cleanup	2023-01-26 14:27:18 +09:00
Paul O'Leary McCann	a060ed21e8	Add output arg for assemble and pretrain Assemble and pretrain require an output argument. This commit adds assemble testing, but not pretrain, as that requires an actual trainable component, which is not currently in the test config.	2023-01-25 19:59:38 +09:00
Paul O'Leary McCann	9912eff0b5	Use a more generic, parametrized test	2023-01-25 18:37:55 +09:00
Paul O'Leary McCann	6d594b966c	Add debug config test and restructure The code argument imports the provided file. If it adds item to the registry, that affects global state, which CliRunner doesn't isolate. Since there's no standard way to remove things from the registry, this instead uses subprocess.run to run commands.	2023-01-25 15:42:42 +09:00
Paul O'Leary McCann	5cddb4e320	Add debug data test, plus generic fixtures One tricky thing here: it's tempting to create the config by creating a pipeline in code, but that requires declaring the custom components here. However the CliRunner appears to be run in the same process or otherwise have access to our registry, so it works even without any code arguments. So it's necessary to avoid declaring the components in the tests.	2023-01-25 14:43:35 +09:00
Daniël de Kok	319eb508b5	Add a `spacy benchmark speed` subcommand (#11902 ) * Add a `spacy evaluate speed` subcommand This subcommand reports the mean batch performance of a model on a data set with a 95% confidence interval. For reliability, it first performs some warmup rounds. Then it will measure performance on batches with randomly shuffled documents. To avoid having too many spaCy commands, `speed` is a subcommand of `evaluate` and accuracy evaluation is moved to its own `evaluate accuracy` subcommand. * Fix import cycle * Restore `spacy evaluate`, make `spacy benchmark speed` an alias * Add documentation for `spacy benchmark` * CREATES -> PRINTS * WPS -> words/s * Disable formatting of benchmark speed arguments * Fail with an error message when trying to speed bench empty corpus * Make it clearer that `benchmark accuracy` is a replacement for `evaluate` * Fix docstring webpage reference * tests: check `evaluate` output against `benchmark accuracy`	2023-01-12 11:55:21 +01:00
Sofie Van Landeghem	7f6c638c3a	fix processing of "auto" in convert (#12050 ) * fix processing of "auto" in walk_directory * add check for None * move AUTO check to convert and fix verification of args * add specific CLI test with CliRunner * cleanup * more cleanup * update docstring	2023-01-05 10:21:00 +01:00

10 Commits