spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-23 18:41:59 +03:00

Author	SHA1	Message	Date
kadarakos	c7e7343999	Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-03 15:56:00 +01:00
kadarakos	fded200128	Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-03 15:55:42 +01:00
kadarakos	b972328337	Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-03 15:55:15 +01:00
kadarakos	0a74e8c260	Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-03-03 15:54:54 +01:00
Adriane Boyd	6182213fef	Merge branch 'master' into add/exclusive-spancat	2023-03-01 15:51:16 +01:00
Sofie Van Landeghem	74cae47bf6	rely on is_empty property instead of __len__ (#12347 )	2023-03-01 12:06:07 +01:00
Adriane Boyd	8f058e39bd	Fix error message for displacy auto_select_port (#12343 )	2023-02-28 16:36:03 +01:00
TAN Long	071667376a	Add new REL_OPs: `>+`, `>-`, `<+`, and `<-` (#12334 ) * Add immediate left/right child/parent dependency relations * Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`. --------- Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-02-28 14:36:33 +01:00
lise-brinck	e2de188cf1	Bugfix/swedish tokenizer (#12315 ) * add unittest for explosion#12311 * create punctuation.py for swedish * removed : from infixes in swedish punctuation.py * allow : as infix if succeeding char is uppercase	2023-02-27 10:53:45 +01:00
Kevin Humphreys	acdd993071	Matcher performance fix for extension predicates: use shared key function (#12272 ) * standardize predicate key format * single key function * Make optional args in key function keyword-only --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-02-27 08:35:08 +01:00
Paul O'Leary McCann	1e8bac99f3	Add tests for projects to master (#12303 ) * Add tests for projects to master * Fix git clone related issues on Windows * Add stat import	2023-02-23 10:22:57 +01:00
kadarakos	86d3e78c64	make label mapper private	2023-02-20 17:02:27 +00:00
kadarakos	813b3551ed	Merge branch 'add/exclusive-spancat' of github.com:ljvmiranda921/spaCy into spancat-exclusive	2023-02-20 10:52:34 +00:00
kadarakos	6f3b257cf4	raise error instead of just print	2023-02-20 10:48:41 +00:00
kadarakos	43d5cab2c2	Update spacy/pipeline/spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-02-20 11:37:51 +01:00
kadarakos	e847487ebb	remove duplicate declaration	2023-02-20 10:36:54 +00:00
kadarakos	af3fa670d4	Update spacy/tests/pipeline/test_spancat.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-02-20 11:36:32 +01:00
Adriane Boyd	80bc140533	Add grc to langs with lexeme norms in spacy-lookups-data (#12287 )	2023-02-16 17:57:02 +01:00
Edward	61b8454137	Adjust return type of `registry.find` (#12227 ) * Fix registry find return type * add dot * Add type ignore for mypy * update black formatting version * add mypy ignore to package cli * mypy type fix (for real) * Update find description in spacy/util.py Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * adjust mypy directive --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-02-15 12:32:53 +01:00
kadarakos	afc3a5a4af	black	2023-02-10 14:07:39 +00:00
kadarakos	a07aafc28e	refactor make_span_group	2023-02-10 14:06:56 +00:00
kadarakos	a281a7c9a1	tests for make_span_group with negative labels	2023-02-10 14:06:07 +00:00
kadarakos	b98cba2bef	black	2023-02-08 19:45:01 +00:00
kadarakos	43162029bc	bugfix	2023-02-08 19:43:51 +00:00
kadarakos	ec941a128d	single label make_spangroup test	2023-02-08 19:43:33 +00:00
kadarakos	6fc25f64dd	add spans.attrs[scores]	2023-02-07 18:12:32 +00:00
kadarakos	afc3ce1c7e	logical bug in configuration check	2023-02-06 19:05:35 +00:00
kadarakos	5c927effde	mypy	2023-02-06 19:03:33 +00:00
kadarakos	c24b3785a6	replace single_label with add_negative_label and adjust inference	2023-02-06 18:54:30 +00:00
kadarakos	c864f12e28	remove spancat exclusive	2023-02-06 10:15:53 +00:00
kadarakos	b8cdcfb2f5	black	2023-02-02 15:23:05 +00:00
kadarakos	d13e494abd	don't rely on default arguments	2023-02-02 10:36:36 +00:00
Sofie Van Landeghem	79ef6cf0f9	Have logging calls use string formatting types (#12215 ) * change logging call for spacy.LookupsDataLoader.v1 * substitutions in language and _util * various more substitutions * add string formatting guidelines to contribution guidelines	2023-02-02 11:15:22 +01:00
kadarakos	5ccb154972	more docstring and fix negative_label	2023-02-01 11:16:34 +00:00
kadarakos	edf9134e45	add docstrings	2023-01-31 17:06:20 +00:00
kadarakos	079f09b97c	black	2023-01-31 16:33:06 +00:00
kadarakos	8a807ef1dd	black	2023-01-31 16:30:12 +00:00
kadarakos	dceeb02b94	wire up different make_spangroups for single and multilabel	2023-01-31 16:27:26 +00:00
kadarakos	52e7324df4	Merge branch 'master' into spancat-exclusive	2023-01-31 16:05:08 +00:00
kadarakos	f1e091a31f	rename spancat_exclusive to singlelable	2023-01-31 16:04:35 +00:00
kadarakos	3f6fd410cc	merge multilabel and singlelabel spancat	2023-01-31 16:04:11 +00:00
kadarakos	330a452f5e	Merge branch 'master' into spancat-exclusive	2023-01-31 16:03:35 +00:00
Raphael Mitsch	02af17a5c8	Remove flaky assertions. (#12210 )	2023-01-31 16:52:06 +01:00
Adriane Boyd	606273f7e4	Normalize whitespace in evaluate CLI output test (#12157 ) * Normalize whitespace in evaluate CLI output test Depending on terminal settings, lines may be padded to the screen width so the comparison is too strict with only the command string replacement. * Move to test util method * Change to normalization method	2023-01-27 16:13:34 +01:00
Adriane Boyd	5f8a398bb9	Add span_id to Span.char_span, update Doc/Span.char_span docs (#12196 ) * Add span_id to Span.char_span, update Doc/Span.char_span docs `Span.char_span(id=)` should be removed in the future. * Also use Union[int, str] in Doc docstring	2023-01-27 15:09:17 +01:00
Simon Gurcke	774c10fa39	Add alignment_mode argument to Span.char_span() (#12145 ) * Add alignment_mode argument to Span.char_span() * Update website * Update spacy/tokens/span.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-27 11:43:40 +01:00
Peter Baumgartner	c68e6b8a96	`trainable_lemmatizer` in `debug data` (#11419 ) * WIP * rm ipython embeds * rm total * WIP * cleanup * cleanup + reword * rm component function * remove migration support form * fix reference dataset for dev data * additional fixes - set approach to identifying unique trees - adjust line length on messages - add logic for detecting docs without annotations * use 0 instead of none for no annotation * partial annotation support * initial tests for _compile_gold lemma attributes Using the example data from the edit tree lemmatizer tests for: - lemmatizer_trees - partial_lemma_annotations - n_low_cardinality_lemmas - no_lemma_annotations * adds output test for cli app * switch msg level * rm unclear uniqueness check * Revert "rm unclear uniqueness check" This reverts commit `6ea2b3524b`. * remove good message on uniqueness * formatting * use en_vocab fixture * clarify data set source in messages * remove unnecessary import Co-authored-by: svlandeg <svlandeg@github.com>	2023-01-26 17:36:50 +01:00
Daniël de Kok	8d69874afb	Add `spacy.PlainTextCorpusReader.v1` (#12122 ) * Add `spacy.PlainTextCorpusReader.v1` This is a corpus reader that reads plain text corpora with the following format: - UTF-8 encoding - One line per document. - Blank lines are ignored. It is useful for applications where we deal with very large corpora, such as distillation, and don't want to deal with the space overhead of serialized formats. Additionally, many large corpora already use such a text format, keeping the necessary preprocessing to a minimum. * Update spacy/training/corpus.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * docs: add version to `PlainTextCorpus` * Add docstring to registry function * Add plain text corpus tests * Only strip newline/carriage return * Add return type _string_to_tmp_file helper * Use a temporary directory in place of file name Different OS auto delete/sharing semantics are just wonky. * This will be new in 3.5.1 (rather than 4) * Test improvements from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-26 11:33:22 +01:00
Raphael Mitsch	950fceceb6	Make test_cli_find_threshold() more robust. (#12148 )	2023-01-23 14:42:33 +01:00
Richard Hudson	f9e020dd67	Fix speed problem with `top_k>1` on CPU in edit tree lemmatizer (#12017 ) * Refactor _scores2guesses * Handle arrays on GPU * Convert argmax result to raw integer Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Use NumpyOps() to copy data to CPU Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Changes based on review comments * Use different _scores2guesses depending on tree_k * Add tests for corner cases * Add empty line for consistency * Improve naming Co-authored-by: Daniël de Kok <me@github.danieldk.eu> * Improve naming Co-authored-by: Daniël de Kok <me@github.danieldk.eu> Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2023-01-20 19:34:11 +01:00

1 2 3 4 5 ...

9336 Commits