spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-06 01:34:25 +03:00

Author	SHA1	Message	Date
kadarakos	86d3e78c64	make label mapper private	2023-02-20 17:02:27 +00:00
kadarakos	813b3551ed	Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into spancat-exclusive	2023-02-20 10:52:34 +00:00
kadarakos	6f3b257cf4	raise error instead of just print	2023-02-20 10:48:41 +00:00
kadarakos	43d5cab2c2	Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-02-20 11:37:51 +01:00
kadarakos	e847487ebb	remove duplicate declaration	2023-02-20 10:36:54 +00:00
kadarakos	af3fa670d4	Update spacy/tests/pipeline/test_spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-02-20 11:36:32 +01:00
kadarakos	afc3a5a4af	black	2023-02-10 14:07:39 +00:00
kadarakos	a07aafc28e	refactor make_span_group	2023-02-10 14:06:56 +00:00
kadarakos	a281a7c9a1	tests for make_span_group with negative labels	2023-02-10 14:06:07 +00:00
kadarakos	b98cba2bef	black	2023-02-08 19:45:01 +00:00
kadarakos	43162029bc	bugfix	2023-02-08 19:43:51 +00:00
kadarakos	ec941a128d	single label make_spangroup test	2023-02-08 19:43:33 +00:00
kadarakos	6fc25f64dd	add spans.attrs[scores]	2023-02-07 18:12:32 +00:00
kadarakos	afc3ce1c7e	logical bug in configuration check	2023-02-06 19:05:35 +00:00
kadarakos	5c927effde	mypy	2023-02-06 19:03:33 +00:00
kadarakos	c24b3785a6	replace single_label with add_negative_label and adjust inference	2023-02-06 18:54:30 +00:00
kadarakos	c864f12e28	remove spancat exclusive	2023-02-06 10:15:53 +00:00
kadarakos	b8cdcfb2f5	black	2023-02-02 15:23:05 +00:00
kadarakos	d13e494abd	don't rely on default arguments	2023-02-02 10:36:36 +00:00
kadarakos	5ccb154972	more docstring and fix negative_label	2023-02-01 11:16:34 +00:00
kadarakos	edf9134e45	add docstrings	2023-01-31 17:06:20 +00:00
kadarakos	079f09b97c	black	2023-01-31 16:33:06 +00:00
kadarakos	8a807ef1dd	black	2023-01-31 16:30:12 +00:00
kadarakos	dceeb02b94	wire up different make_spangroups for single and multilabel	2023-01-31 16:27:26 +00:00
kadarakos	52e7324df4	Merge branch 'master' into spancat-exclusive	2023-01-31 16:05:08 +00:00
kadarakos	f1e091a31f	rename spancat_exclusive to singlelable	2023-01-31 16:04:35 +00:00
kadarakos	3f6fd410cc	merge multilabel and singlelabel spancat	2023-01-31 16:04:11 +00:00
kadarakos	330a452f5e	Merge branch 'master' into spancat-exclusive	2023-01-31 16:03:35 +00:00
Raphael Mitsch	02af17a5c8	Remove flaky assertions. (#12210 )	2023-01-31 16:52:06 +01:00
Adriane Boyd	606273f7e4	Normalize whitespace in evaluate CLI output test (#12157 ) * Normalize whitespace in evaluate CLI output test Depending on terminal settings, lines may be padded to the screen width so the comparison is too strict with only the command string replacement. * Move to test util method * Change to normalization method	2023-01-27 16:13:34 +01:00
Adriane Boyd	5f8a398bb9	Add span_id to Span.char_span, update Doc/Span.char_span docs (#12196 ) * Add span_id to Span.char_span, update Doc/Span.char_span docs `Span.char_span(id=)` should be removed in the future. * Also use Union[int, str] in Doc docstring	2023-01-27 15:09:17 +01:00
Simon Gurcke	774c10fa39	Add alignment_mode argument to Span.char_span() (#12145 ) * Add alignment_mode argument to Span.char_span() * Update website * Update spacy/tokens/span.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-27 11:43:40 +01:00
Peter Baumgartner	c68e6b8a96	`trainable_lemmatizer` in `debug data` (#11419 ) * WIP * rm ipython embeds * rm total * WIP * cleanup * cleanup + reword * rm component function * remove migration support form * fix reference dataset for dev data * additional fixes - set approach to identifying unique trees - adjust line length on messages - add logic for detecting docs without annotations * use 0 instead of none for no annotation * partial annotation support * initial tests for _compile_gold lemma attributes Using the example data from the edit tree lemmatizer tests for: - lemmatizer_trees - partial_lemma_annotations - n_low_cardinality_lemmas - no_lemma_annotations * adds output test for cli app * switch msg level * rm unclear uniqueness check * Revert "rm unclear uniqueness check" This reverts commit `6ea2b3524b`. * remove good message on uniqueness * formatting * use en_vocab fixture * clarify data set source in messages * remove unnecessary import Co-authored-by: svlandeg <svlandeg@github.com>	2023-01-26 17:36:50 +01:00
Daniël de Kok	8d69874afb	Add `spacy.PlainTextCorpusReader.v1` (#12122 ) * Add `spacy.PlainTextCorpusReader.v1` This is a corpus reader that reads plain text corpora with the following format: - UTF-8 encoding - One line per document. - Blank lines are ignored. It is useful for applications where we deal with very large corpora, such as distillation, and don't want to deal with the space overhead of serialized formats. Additionally, many large corpora already use such a text format, keeping the necessary preprocessing to a minimum. * Update spacy/training/corpus.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * docs: add version to `PlainTextCorpus` * Add docstring to registry function * Add plain text corpus tests * Only strip newline/carriage return * Add return type _string_to_tmp_file helper * Use a temporary directory in place of file name Different OS auto delete/sharing semantics are just wonky. * This will be new in 3.5.1 (rather than 4) * Test improvements from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-26 11:33:22 +01:00
Raphael Mitsch	950fceceb6	Make test_cli_find_threshold() more robust. (#12148 )	2023-01-23 14:42:33 +01:00
Richard Hudson	f9e020dd67	Fix speed problem with `top_k>1` on CPU in edit tree lemmatizer (#12017 ) * Refactor _scores2guesses * Handle arrays on GPU * Convert argmax result to raw integer Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Use NumpyOps() to copy data to CPU Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Changes based on review comments * Use different _scores2guesses depending on tree_k * Add tests for corner cases * Add empty line for consistency * Improve naming Co-authored-by: Daniël de Kok <me@github.danieldk.eu> * Improve naming Co-authored-by: Daniël de Kok <me@github.danieldk.eu> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2023-01-20 19:34:11 +01:00
Adriane Boyd	1e993d3b03	Merge pull request #12121 from adrianeboyd/chore/v3.5.0-2 Revert "Temporarily skip tests that require models/compat"	2023-01-19 15:59:30 +01:00
Adriane Boyd	3b8918e166	API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar (#12128 ) * API docs: Rename kb_in_memory to inmemorylookupkb, add to sidebar * adjust to mdx * linkout to InMemoryLookupKB at first occurrence in kb.mdx * fix links to docs * revert Azure trigger setting (I'll make a separate PR) Co-authored-by: svlandeg <svlandeg@github.com>	2023-01-19 13:29:17 +01:00
Adriane Boyd	dc0f527039	Revert "Temporarily skip tests that require models/compat" This reverts commit `378db0eb1e`.	2023-01-18 12:54:56 +01:00
Adriane Boyd	794cea6907	Fix comments and examples for levenshtein_compare (#12113 )	2023-01-18 08:02:33 +01:00
Lj Miranda	a722bd8fba	Add suggester to spancat docstrings	2023-01-17 20:38:35 +08:00
Lj Miranda	26d5d637e3	Add suggester documentation in Exclusive_SpanCategorizer	2023-01-17 10:34:21 +08:00
Lj Miranda	e61f0a4035	Update how spancat_exclusive is constructed In this commit, I added the following: - Put the default values of negative_weight and allow_overlap in the default_config dictionary. - Rename make_spancat -> make_exclusive_spancat	2023-01-17 10:17:29 +08:00
Lj Miranda	65ce4347ef	Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-01-17 09:38:47 +08:00
Lj Miranda	bf2f0173d2	Merge branch 'master' into add/exclusive-spancat	2023-01-13 17:30:29 +08:00
github-actions[bot]	9ef7d26032	Auto-format code with black (#12100 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2023-01-13 10:12:10 +01:00
Daniël de Kok	dda7331da3	Handle missing annotations in the edit tree lemmatizer (#12098 ) The losses/gradients of missing annotations were not correctly masked out. Fix this and check the masking in the partial data test.	2023-01-12 12:13:55 +01:00
Daniël de Kok	319eb508b5	Add a `spacy benchmark speed` subcommand (#11902 ) * Add a `spacy evaluate speed` subcommand This subcommand reports the mean batch performance of a model on a data set with a 95% confidence interval. For reliability, it first performs some warmup rounds. Then it will measure performance on batches with randomly shuffled documents. To avoid having too many spaCy commands, `speed` is a subcommand of `evaluate` and accuracy evaluation is moved to its own `evaluate accuracy` subcommand. * Fix import cycle * Restore `spacy evaluate`, make `spacy benchmark speed` an alias * Add documentation for `spacy benchmark` * CREATES -> PRINTS * WPS -> words/s * Disable formatting of benchmark speed arguments * Fail with an error message when trying to speed bench empty corpus * Make it clearer that `benchmark accuracy` is a replacement for `evaluate` * Fix docstring webpage reference * tests: check `evaluate` output against `benchmark accuracy`	2023-01-12 11:55:21 +01:00
Paul O'Leary McCann	8e558095a1	Clean up displacy port-related error messages, docs (#12089 ) * Clean up displacy port-related error messages, docs There were some issues in the error messages and docs in #11948. 1. the error messages didn't specify the port argument to displacy.serve correctly 2. the docs didn't mark the auto select argument as new This addresses those issues. * Update website/docs/api/top-level.md Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * Apply prettier Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-01-12 14:54:09 +09:00
Adriane Boyd	9e0322de1a	Restore v2 token_acc score implementation (#12073 ) In the v3 scorer refactoring, `token_acc` was implemented incorrectly. It should use `precision` instead of `fscore` for the measure of correctly aligned tokens / number of predicted tokens. Fix the docs to reflect that the measure uses the number of predicted tokens rather than the number of gold tokens.	2023-01-11 08:01:47 +01:00

1 2 3 4 5 ...

9322 Commits