spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-08 10:44:30 +03:00

Author	SHA1	Message	Date
Adriane Boyd	f7a260748a	isort	2023-07-18 15:20:50 +02:00
Adriane Boyd	8984a20d17	Fix imports in tests	2023-07-18 15:19:19 +02:00
Adriane Boyd	20880db680	isort	2023-07-18 15:07:52 +02:00
Adriane Boyd	28c8a577fc	Merge branch 'v4' into feature/multiple-code-files	2023-07-18 15:06:18 +02:00
Daniël de Kok	2468742cb8	isort all the things	2023-06-26 11:41:03 +02:00
Sofie Van Landeghem	d65e3c31a6	use system-independent commands (#12693 )	2023-06-08 11:43:36 +02:00
Adriane Boyd	bae85e4c82	Merge branch 'v4' into feature/multiple-code-files	2023-03-17 08:44:10 +01:00
Adriane Boyd	f27bce67fd	Skip project clone tests if git is not available (#12394 )	2023-03-09 16:41:21 +01:00
Paul O'Leary McCann	1e8bac99f3	Add tests for projects to master (#12303 ) * Add tests for projects to master * Fix git clone related issues on Windows * Add stat import	2023-02-23 10:22:57 +01:00
Paul O'Leary McCann	7ef87e24ca	Merge branch 'v4' into feature/multiple-code-files	2023-02-06 14:43:19 +09:00
Paul O'Leary McCann	5aff2b8204	Merge branch 'v4' into feature/multiple-code-files	2023-02-02 12:34:59 +09:00
Adriane Boyd	606273f7e4	Normalize whitespace in evaluate CLI output test (#12157 ) * Normalize whitespace in evaluate CLI output test Depending on terminal settings, lines may be padded to the screen width so the comparison is too strict with only the command string replacement. * Move to test util method * Change to normalization method	2023-01-27 16:13:34 +01:00
Peter Baumgartner	c68e6b8a96	`trainable_lemmatizer` in `debug data` (#11419 ) * WIP * rm ipython embeds * rm total * WIP * cleanup * cleanup + reword * rm component function * remove migration support form * fix reference dataset for dev data * additional fixes - set approach to identifying unique trees - adjust line length on messages - add logic for detecting docs without annotations * use 0 instead of none for no annotation * partial annotation support * initial tests for _compile_gold lemma attributes Using the example data from the edit tree lemmatizer tests for: - lemmatizer_trees - partial_lemma_annotations - n_low_cardinality_lemmas - no_lemma_annotations * adds output test for cli app * switch msg level * rm unclear uniqueness check * Revert "rm unclear uniqueness check" This reverts commit `6ea2b3524b`. * remove good message on uniqueness * formatting * use en_vocab fixture * clarify data set source in messages * remove unnecessary import Co-authored-by: svlandeg <svlandeg@github.com>	2023-01-26 17:36:50 +01:00
Paul O'Leary McCann	0f78418c5c	Mark tests as slow	2023-01-26 14:28:10 +09:00
Paul O'Leary McCann	2f74158b32	Add evaluate test and some cleanup	2023-01-26 14:27:18 +09:00
Paul O'Leary McCann	a060ed21e8	Add output arg for assemble and pretrain Assemble and pretrain require an output argument. This commit adds assemble testing, but not pretrain, as that requires an actual trainable component, which is not currently in the test config.	2023-01-25 19:59:38 +09:00
Paul O'Leary McCann	9912eff0b5	Use a more generic, parametrized test	2023-01-25 18:37:55 +09:00
Paul O'Leary McCann	6d594b966c	Add debug config test and restructure The code argument imports the provided file. If it adds item to the registry, that affects global state, which CliRunner doesn't isolate. Since there's no standard way to remove things from the registry, this instead uses subprocess.run to run commands.	2023-01-25 15:42:42 +09:00
Paul O'Leary McCann	5cddb4e320	Add debug data test, plus generic fixtures One tricky thing here: it's tempting to create the config by creating a pipeline in code, but that requires declaring the custom components here. However the CliRunner appears to be run in the same process or otherwise have access to our registry, so it works even without any code arguments. So it's necessary to avoid declaring the components in the tests.	2023-01-25 14:43:35 +09:00
Daniël de Kok	319eb508b5	Add a `spacy benchmark speed` subcommand (#11902 ) * Add a `spacy evaluate speed` subcommand This subcommand reports the mean batch performance of a model on a data set with a 95% confidence interval. For reliability, it first performs some warmup rounds. Then it will measure performance on batches with randomly shuffled documents. To avoid having too many spaCy commands, `speed` is a subcommand of `evaluate` and accuracy evaluation is moved to its own `evaluate accuracy` subcommand. * Fix import cycle * Restore `spacy evaluate`, make `spacy benchmark speed` an alias * Add documentation for `spacy benchmark` * CREATES -> PRINTS * WPS -> words/s * Disable formatting of benchmark speed arguments * Fail with an error message when trying to speed bench empty corpus * Make it clearer that `benchmark accuracy` is a replacement for `evaluate` * Fix docstring webpage reference * tests: check `evaluate` output against `benchmark accuracy`	2023-01-12 11:55:21 +01:00
Sofie Van Landeghem	7f6c638c3a	fix processing of "auto" in convert (#12050 ) * fix processing of "auto" in walk_directory * add check for None * move AUTO check to convert and fix verification of args * add specific CLI test with CliRunner * cleanup * more cleanup * update docstring	2023-01-05 10:21:00 +01:00

21 Commits