spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-12-05 01:04:45 +03:00

Author	SHA1	Message	Date
github-actions[bot]	015d439eb6	Auto-format code with black (#9234 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-09-20 08:49:19 +02:00
Paul O'Leary McCann	c4f0800fb8	Validate pos values when creating Doc (#9148 ) * Validate pos values when creating Doc * Add clear error when setting invalid pos This also changes the error language slightly. * Fix variable name * Update spacy/tokens/doc.pyx * Test that setting invalid pos raises an error Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-09-16 13:28:05 +02:00
Jozef Harag	865cfbc903	feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters (#9202 ) * feat: add `spacy.WandbLogger.v3` with optional `run_name` and `entity` parameters * update versioning in docs Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-09-16 12:26:41 +02:00
Ines Montani	20f63e7154	Only include runtime-relevant config in package CLI dependency detection (#9211 )	2021-09-15 23:16:01 +02:00
Adriane Boyd	d74870d38c	Prepare for v3.1.3 (#9200 ) * Update thinc and spacy-legacy requirements * Set version to v3.1.3	2021-09-14 11:03:51 +02:00
j-frei	462b009648	Correct parser.py use_upper param info (#9180 )	2021-09-10 16:19:58 +02:00
Adriane Boyd	aba6ce3a43	Handle spacy-legacy in package CLI for dependencies (#9163 ) * Handle spacy-legacy in package CLI for dependencies * Implement legacy backoff in spacy registry.find * Remove unused import * Update and format test	2021-09-08 11:46:40 +02:00
github-actions[bot]	584fae5807	Auto-format code with black (#9130 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-09-03 10:47:03 +02:00
Kevin Humphreys	ca93504660	Pass alignments to Matcher callbacks (#9001 ) * pass alignments to callbacks * refactor for single callback loop * Update spacy/matcher/matcher.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-09-02 12:58:05 +02:00
Sofie Van Landeghem	8895e3c9ad	matcher doc corrections (#9115 ) * update error message to current UX * clarify uppercase effect * fix docstring	2021-09-02 09:26:33 +02:00
Robyn Speer	d60b748e3c	Fix surprises when asking for the root of a git repo (#9074 ) * Fix surprises when asking for the root of a git repo In the case of the first asset I wanted to get from git, the data I wanted was the entire repository. I tried leaving "path" blank, which gave a less-than-helpful error, and then I tried `path: "/"`, which started copying my entire filesystem into the project. The path I should have used was "". I've made two changes to make this smoother for others: - The 'path' within a git clone defaults to "" - If the path points outside of the tmpdir that the git clone goes into, we fail with an error Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * use a descriptive error instead of a default plus some minor fixes from PR review Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * check for None values in assets Signed-off-by: Elia Robyn Speer <elia@explosion.ai> Co-authored-by: Elia Robyn Speer <elia@explosion.ai>	2021-09-01 22:52:08 +02:00
Paul O'Leary McCann	f803a84571	Fix inference of epoch_resume (#9084 ) * Fix inference of epoch_resume When an epoch_resume value is not specified individually, it can often be inferred from the filename. The value inference code was there but the value wasn't passed back to the training loop. This also adds a specific error in the case where no epoch_resume value is provided and it can't be inferred from the filename. * Add new error * Always use the epoch resume value if specified Before this the value in the filename was used if found	2021-09-01 14:17:42 +09:00
Adriane Boyd	1e9b4b55ee	Pass overrides to subcommands in workflows (#9059 ) * Pass overrides to subcommands in workflows * Add missing docstring	2021-08-30 09:23:54 +02:00
Sofie Van Landeghem	1e974de837	config is not Optional (#9024 )	2021-08-27 11:44:31 +02:00
github-actions[bot]	fb9c31fbda	Auto-format code with black (#9065 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-27 11:42:27 +02:00
Sofie Van Landeghem	4d39430b82	Document use-case of freezing tok2vec (#8992 ) * update error msg * add sentence to docs * expand note on frozen components	2021-08-26 09:50:35 +02:00
Sofie Van Landeghem	94fb840443	fix docs for Span constructor arguments (#9023 )	2021-08-25 16:06:22 +02:00
David Strouk	31e9b126a0	Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033 )	2021-08-25 15:55:09 +02:00
Ines Montani	4cd052e81d	Include component factories in third-party dependencies resolver (#9009 ) * Include component factories in third-party dependencies resolver * Increment catalogue and update test	2021-08-25 14:58:01 +02:00
Sofie Van Landeghem	e1f88de729	bump to 3.1.2 (#9008 )	2021-08-20 12:41:09 +02:00
Sofie Van Landeghem	4d52d7051c	Fix spancat training on nested entities (#9007 ) * overfitting test on non-overlapping entities * add failing overfitting test for overlapping entities * failing test for list comprehension * remove test that was put in separate PR * bugfix * cleanup	2021-08-20 12:37:50 +02:00
Paul O'Leary McCann	9cc3dc2b67	Add glossary entry for _SP (#8983 )	2021-08-20 12:04:02 +02:00
Sofie Van Landeghem	de025beb5f	Warn and document spangroup.doc weakref (#8980 ) * test for error after Doc has been garbage collected * warn about using a SpanGroup when the Doc has been garbage collected * add warning to the docs * rephrase slightly * raise error instead of warning * update * move warning to doc property	2021-08-20 11:06:19 +02:00
Adriane Boyd	6722dc3dc5	Fix allow_overlap default for spancat scoring (#8970 ) * Remove irrelevant default options	2021-08-18 09:56:56 +02:00
Steele Farnsworth	b18cb1cd2a	Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956 ) * Refactor to use list comps and enumerate. Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal. * Undo double assignment. Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Sign contributors agreement Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-18 09:55:45 +02:00
Ines Montani	d94ddd5686	Auto-detect package dependencies in spacy package (#8948 ) * Auto-detect package dependencies in spacy package * Add simple get_third_party_dependencies test * Import packages_distributions explicitly * Inline packages_distributions * Fix docstring [ci skip] * Relax catalogue requirement * Move importlib_metadata to spacy.compat with note * Include license information [ci skip]	2021-08-17 14:05:13 +02:00
Sofie Van Landeghem	0a6b68848f	Fix making span_group (#8975 ) * fix _make_span_group * fix imports	2021-08-17 10:36:34 +02:00
github-actions[bot]	92071326d8	Auto-format code with black (#8950 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-13 11:48:38 +02:00
Adriane Boyd	8448c7dbc5	Update da trf recommendation (#8921 ) Update the da trf recommendation to the same model used in the pretrained pipelines.	2021-08-12 13:54:02 +02:00
Paul O'Leary McCann	e227d24d43	Allow passing in array vars for speedup (#8882 ) * Allow passing in array vars for speedup This fixes #8845. Not sure about the docstring changes here... * Update docs Types maybe need more detail? Maybe not? * Run prettier on docs * Update spacy/tokens/span.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-10 15:13:53 +02:00
Paul O'Leary McCann	6029cfc391	Add scores to output in spancat (#8855 ) * Add scores to output in spancat This exposes the scores as an attribute on the SpanGroup. Includes a basic test. * Add basic doc note * Vectorize score calcs * Add "annotation format" section * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clean up doc section * Ran prettier on docs * Get arrays off the gpu before iterating over them * Remove int() calls Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-08-10 13:47:49 +02:00
Eduard Zorita	439f30faad	Add stub files for main cython classes (#8427 ) * Add stub files for main API classes * Add contributor agreement for ezorita * Update types for ndarray and hash() * Fix __getitem__ and __iter__ * Add attributes of Doc and Token classes * Overload type hints for Span.__getitem__ * Fix type hint overload for Span.__getitem__ Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>	2021-08-07 12:30:03 +02:00
github-actions[bot]	56d4d87aeb	Auto-format code with black (#8895 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-06 13:38:06 +02:00
Kabir Khan	1dfffe5fb4	No output info message in train (#8885 ) * Add info message that no output directory was provided in train * Update train.py * Fix logging	2021-08-05 09:21:22 +02:00
Adriane Boyd	fa2e7a4bbf	Fix spancat tests on GPU (#8872 ) * Fix spancat tests on GPU * Fix more spancat tests	2021-08-04 14:29:43 +02:00
Paul O'Leary McCann	77d698dcae	Fix check for RIGHT_ATTRS in dep matcher (#8807 ) * Fix check for RIGHT_ATTRs in dep matcher If a non-anchor node does not have RIGHT_ATTRS, the dep matcher throws an E100, which says that non-anchor nodes must have LEFT_ID, REL_OP, and RIGHT_ID. It specifically does not say RIGHT_ATTRS is required. A blank RIGHT_ATTRS is also valid, and patterns with one will be excepted. While not normal, sometimes a REL_OP is enough to specify a non-anchor node - maybe you just want the head of another node unconditionally, for example. This change just sets RIGHT_ATTRS to {} if not present. Alternatively changing E100 to state RIGHT_ATTRS is required could also be reasonable. * Fix test This test was written on the assumption that if `RIGHT_ATTRS` isn't present an error will be raised. Since the proposed changes make it so an error won't be raised this is no longer necessary. * Revert test, update error message Error message now lists missing keys, and RIGHT_ATTRS is required. * Use list of required keys in error message Also removes unused key param arg.	2021-08-04 09:20:41 +02:00
Adriane Boyd	941a591f3c	Pass excludes when serializing vocab (#8824 ) * Pass excludes when serializing vocab Additional minor bug fix: * Deserialize vocab in `EntityLinker.from_disk` * Add test for excluding strings on load * Fix formatting	2021-08-03 14:42:44 +02:00
Adriane Boyd	175847f92c	Support list values and INTERSECTS in Matcher (#8784 ) * Support list values and IS_INTERSECT in Matcher * Support list values as token attributes for set operators, not just as pattern values. * Add `IS_INTERSECT` operator. * Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs. * Rename IS_INTERSECT to INTERSECTS	2021-08-02 19:39:26 +02:00
Adriane Boyd	fbbbda1954	Fix start/end chars for empty and out-of-bounds spans (#8816 )	2021-08-02 19:07:19 +02:00
Adriane Boyd	9ad3b8cf8d	Only add sourced vectors hashes to meta if necessary (#8830 )	2021-08-02 18:22:35 +02:00
Nick Sorros	0485cdefcc	Add logger debug for project push and pull (#8860 ) * Add logger debug for project push and pull * Sign contributor agreement	2021-08-02 18:13:53 +02:00
themrmax	de076194c4	Make ConsoleLogger flush after each logging line (#8810 ) This is necessary to avoid "logging blackouts" when running training on Kubernetes pods	2021-08-02 14:33:38 +02:00
Ines Montani	7f21c7dfa2	Merge pull request #8794 from explosion/autoblack Auto-format code with black	2021-07-27 12:17:15 +10:00
Paul O'Leary McCann	284b530c63	Respect the no_skip value Seems like the logic for this was just left out. See #8796.	2021-07-24 15:31:17 +09:00
explosion-bot	a58ab6ea22	Auto-format code with black	2021-07-23 08:04:09 +00:00
Adriane Boyd	6bbc2b1956	Reload train corpus in debug data after initialize (#8776 )	2021-07-21 22:38:40 +02:00
Adriane Boyd	d48c01a6f7	Remove extraneous grc test file (#8768 )	2021-07-20 15:51:15 +02:00
Sofie Van Landeghem	ffaead8fe0	bump to 3.1.1	2021-07-19 14:48:27 +02:00
Sofie Van Landeghem	83e27d262e	negative tag annotation (#8731 ) * unit test to unlearn tag via negative annotation * bump thinc to 8.0.8	2021-07-19 14:39:11 +02:00
Adriane Boyd	0e4b96c97e	Update lexeme ranks for loaded vectors (#8640 ) Update the ranks for any lexemes that have been added to the vocab before the vectors are added to the model.	2021-07-19 18:25:54 +10:00

1 2 3 4 5 ...

8784 Commits