spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-08-08 14:14:57 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	ef5762d78e	Bad hack to get tests to run This changes the tok2vec size in coref to hardcoded 64 to get tests to run. This should be reverted and hopefully replaced with proper shape inference.	2022-06-28 19:06:13 +09:00
Paul O'Leary McCann	af6d5ae2fe	Initial test of mismatched tokenization This runs, but the results are nonsense because the indices are off.	2022-06-28 19:05:47 +09:00
Paul O'Leary McCann	16894e665d	Refactor Coval Scoring code (#10875 ) * Move coref scoring code to scorer.py Includes some renames to make names less generic. * Refactor coval code to remove ternary expressions * Black formatting * Add header * Make scorers into registered scorers * Small test fixes * Skip coref tests when torch not present Coref can't be loaded without Torch, so nothing works. * Fix remaining type issues Some of this just involves ignoring types in thorny areas. Two main issues: 1. Some things have weird types due to indirection/ argskwargs 2. xp2torch return type seems to have changed at some point * Update spacy/scorer.py Co-authored-by: kadarakos <kadar.akos@gmail.com> * Small changes from review * Be specific about the ValueError * Type fix Co-authored-by: kadarakos <kadar.akos@gmail.com>	2022-06-22 16:05:52 +09:00
Paul O'Leary McCann	196886bbca	Fix coref size inference (#10916 ) * Add explicit tok2vec_size parameter in clusterer * Add tok2vec size to span predictor config * Minor fixes	2022-06-08 20:03:41 +09:00
svlandeg	aa2eb2789c	small type fixes	2022-05-25 13:50:54 +02:00
svlandeg	cea40c9d7b	fix types + black formatting	2022-05-25 13:34:09 +02:00
svlandeg	3fee6933d7	Merge branch 'feature/coref' of https://github.com/explosion/spacy into feature/coref	2022-05-25 13:12:42 +02:00
svlandeg	b8bdf998ad	fix types in scorer + black	2022-05-25 13:12:37 +02:00
Adriane Boyd	f75a528787	Update spacy/ml/models/spancat.py	2022-05-25 13:05:41 +02:00
svlandeg	015050f42c	Merge branch 'master' into feature/coref	2022-05-25 13:01:56 +02:00
Paul O'Leary McCann	838f50192b	Black formatting	2022-05-25 19:20:03 +09:00
Paul O'Leary McCann	2a8efda689	Code review suggestions, cleanup	2022-05-25 19:18:26 +09:00
Paul O'Leary McCann	e721c7bed8	Import cleanup	2022-05-25 19:12:20 +09:00
Paul O'Leary McCann	6087da9675	Suggestions from code review, cleanup, typing	2022-05-25 19:11:48 +09:00
Paul O'Leary McCann	6999436270	Fix coref tests	2022-05-25 18:32:47 +09:00
Paul O'Leary McCann	303269c4b2	Skip coref test if no torch	2022-05-25 18:26:31 +09:00
kadarakos	f6a4b80c0b	Better errors for has_annotation and Matcher (#10830 ) * Show input argument instead of None * catch invalid attr early * moved error message from code to errors.py * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/errors.py * update E153 and E154 Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-05-25 11:12:29 +02:00
Sofie Van Landeghem	83ed1f391b	Remove NBSP's across tables in the docs (#10842 )	2022-05-25 09:48:39 +02:00
Richard Hudson	32954c3bcb	Fix issues for Mypy 0.950 and Pydantic 1.9.0 (#10786 ) * Make changes to typing * Correction * Format with black * Corrections based on review * Bumped Thinc dependency version * Bumped blis requirement * Correction for older Python versions * Update spacy/ml/models/textcat.py Co-authored-by: Daniël de Kok <me@github.danieldk.eu> * Corrections based on review feedback * Readd deleted docstring line Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2022-05-25 09:33:54 +02:00
Paul O'Leary McCann	3807a1ba74	Merge pull request #10844 from polm/feature/coref-torch-guard Add guards around torch import for coref	2022-05-25 13:50:46 +09:00
Paul O'Leary McCann	c9233a5a1f	Import torch from thinc	2022-05-24 17:28:27 +09:00
Paul O'Leary McCann	5cbc9f4573	Use thinc.util.has_torch	2022-05-24 16:02:39 +09:00
Paul O'Leary McCann	b1118cee58	Move epsilon	2022-05-24 15:59:08 +09:00
Paul O'Leary McCann	9da16df96e	Add guards around torch import Torch is required for the coref/spanpred models but shouldn't be required for spaCy in general. The one tricky part of this is that one function in coref_util relied on torch, but that file was imported in several places. Since the function was only used in one place I moved it there.	2022-05-24 15:16:25 +09:00
Paul O'Leary McCann	6be09bbd07	Fix Entity Linker with tokenization mismatches (fix #9575 ) (#10457 ) * Add failing test * Partial fix for issue This kind of works. The issue with token length mismatches is gone. The problem is that when you get empty lists of encodings to compare, it fails because the sizes are not the same, even though they're both zero: (0, 3) vs (0,). Not sure why that happens... * Short circuit on empties * Remove spurious check The check here isn't needed now the the short circuit is fixed. * Update spacy/tests/pipeline/test_entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Use "eg", not "example" Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-05-23 20:42:26 +02:00
Lj Miranda	1d34aa2b3d	Add spacy-span-analyzer to debug data (#10668 ) * Rename to spans_key for consistency * Implement spans length in debug data * Implement how span bounds and spans are obtained In this commit, I implemented how span boundaries (the tokens) around a given span and spans are obtained. I've put them in the compile_gold() function so that it's accessible later on. I will do the actual computation of the span and boundary distinctiveness in the main function above. * Compute for p_spans and p_bounds * Add computation for SD and BD * Fix mypy issues * Add weighted average computation * Fix compile_gold conditional logic * Add test for frequency distribution computation * Add tests for kl-divergence computation * Fix weighted average computation * Make tables more compact by rounding them * Add more descriptive checks for spans * Modularize span computation methods In this commit, I added the _get_span_characteristics and _print_span_characteristics functions so that they can be reusable anywhere. * Remove unnecessary arguments and make fxs more compact * Update a few parameter arguments * Add tests for print_span and get_span methods * Update API to talk about span characteristics in brief * Add better reporting of spans_length * Add test for span length reporting * Update formatting of span length report Removed '' to indicate that it's not a string, then sort the n-grams by their length, not by their frequency. * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Show all frequency distribution when -V In this commit, I displayed the full frequency distribution of the span lengths when --verbose is passed. To make things simpler, I rewrote some of the formatter functions so that I can call them whenever. Another notable change is that instead of showing percentages as Integers, I showed them as floats (max 2-decimal places). I did this because it looks weird when it displays (0%). * Update logic on how total is computed The way the 90% thresholding is computed now is that we keep adding the percentages until we reach >= 90%. I also updated the wording and used the term "At least" to denote that >= 90% of your spans have these distributions. * Fix display when showing the threshold percentage * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add better phrasing for span information * Update spacy/cli/debug_data.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add minor edits for whitespaces etc. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-05-23 19:06:38 +02:00
Peter Baumgartner	7ce3460b23	add floret to static vectors docs (#10833 )	2022-05-23 09:16:31 +02:00
kadarakos	a3814ee739	oov confusion fix (#10828 )	2022-05-23 09:15:51 +02:00
Madeesh Kannan	4fb1809c72	Disable weekly GPU/slow tests on forks (#10831 )	2022-05-20 15:46:30 +02:00
Adriane Boyd	a82ec56aae	Remove cuda extras for non-linux arm in install widget (#10796 ) * Remove cuda extras for non-linux arm platforms in install widget * Extend cuda versions install widget * Update GPU install docs to clarify cuda	2022-05-20 09:57:41 +02:00
Paul O'Leary McCann	46982cf694	Add glossary entry for root (#10821 ) * Add glossary entry for root There was already one but it was lower case, maybe that should be removed? * remove lowercase root On reflection, that was probably just a mistake. * Add lowercase root back It's harmless to leave it there.	2022-05-20 09:56:32 +02:00
Paul O'Leary McCann	e38e84a677	Merge pull request #10812 from kadarakos/feature/coref Feature/coref	2022-05-19 16:39:00 +09:00
kadarakos	1dc3894447	new parameters	2022-05-17 15:36:32 +00:00
Raphael Mitsch	357be2614e	Fuzz tokenizer.explain: draft for fuzzy tests. (#10771 ) * Fuzz tokenizer.explain: draft for fuzzy tests. * Fuzz tokenizer.explain: xignoring tokenizer.explain() tests. Removed deadline modification. Removed LANGUAGES_WITHOUT_TOKENIZERS. * Fuzz tokenizer.explain: changed tokenizer initialization to avoid failus in Azure runs. * Fuzz tokenizer.explain: type hint for tokenizer in test. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-05-17 10:23:16 +02:00
kadarakos	403fb95d56	merge	2022-05-17 06:56:34 +00:00
Paul O'Leary McCann	2e8f0e9168	Rename coref params	2022-05-16 16:50:10 +09:00
github-actions[bot]	99aeaf9bd3	Auto-format code with black (#10795 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-05-13 19:02:08 +02:00
Paul O'Leary McCann	13481fbcc2	Remove unused param, add TODOs about typing	2022-05-13 19:29:28 +09:00
Paul O'Leary McCann	6a8625e711	First draft for architecture docs These parameters are probably going to be renamed / have defaults adjusted. Also Model types are off.	2022-05-13 19:28:55 +09:00
kadarakos	fd36469900	bugfix parser labels (#10797 )	2022-05-13 11:41:32 +02:00
Paul O'Leary McCann	7634a488fe	Merge pull request #10793 from Schero1994/feature/update Update spaCy Universe: spacytextblob (code example)	2022-05-13 12:07:37 +09:00
schaeran	f5952c0851	update spaCy Universe: spacytextblob (code example)	2022-05-12 18:23:00 +02:00
Patrick Düggelin	cb06309ed8	Fix PhraseMatcher remove overlapping terms (#10734 ) * Add regression test for issue 10643 * Improve overlapping terms testcase * Fix removing overlapping terms in phrase matcher (#10643)	2022-05-12 12:23:52 +02:00
Raphael Mitsch	6f9e2ca81f	Ignore overrides for pipe names in config argument (#10779 ) * Pipe name override in config: added check with warning, added removal of name override from config, extended tests. * Pipoe name override in config: added pytest UserWarning. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-05-12 11:46:08 +02:00
Adriane Boyd	b65d652881	Override SpanGroups.setdefault to provide default SpanGroup (#10772 ) * Fix mistake in SpanGroup API docs * Restrict SpanGroups.setdefault to SpanGroup only * Refactor to support default span iterable	2022-05-12 10:06:25 +02:00
Paul O'Leary McCann	14eb20f07a	Add span predictor docs	2022-05-12 13:47:06 +09:00
kadarakos	b7ac4b33e2	fixing arguments	2022-05-11 14:59:59 +00:00
Richard Hudson	d524f6415f	Add documentation tip about overriding variables (#10780 )	2022-05-11 10:15:32 +02:00
Paul O'Leary McCann	57165f9631	Merge pull request #10782 from kadarakos/feature/coref Feature/coref	2022-05-11 17:02:21 +09:00
kadarakos	7cf6bcca0e	merge misery	2022-05-10 17:19:16 +00:00

1 2 3 4 5 ...

15605 Commits