spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-10 14:12:37 +03:00

Author	SHA1	Message	Date
github-actions[bot]	71884d0942	Auto-format code with black (#11427 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2022-09-02 11:43:20 +02:00
Madeesh Kannan	d1760ebe02	Better handling of unexpected types in `SetPredicate` (#11312 ) * `Matcher`: Better type checking of values in `SetPredicate` `SetPredicate`: Emit warning and return `False` on unexpected value types * Rename `value_type_mismatch` variable * Inline warning * Remove unexpected type warning from `_SetPredicate` * Ensure that `str` values are not interpreted as sequences Check elements of sequence values for convertibility to `str` or `int` * Add more `INTERSECT` and `IN` test cases * Test for inputs with multiple characters * Return `False` early instead of using a boolean flag * Remove superfluous `int` check, parentheses * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Appy suggestions from code review * Clarify test comment Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-09-02 09:09:48 +02:00
Adriane Boyd	78f5503a29	Check for any non-Doc returned value for components (#11424 )	2022-09-01 19:37:23 +02:00
Madeesh Kannan	604a7c3c26	`SpanGroup(s)`-related optimizations (#11380 ) * `SpanGroup`: Add support for binding copies to a new reference document * `SpanGroups`: Replace superfluous serialize-deserialize roundtrip in `copy` Instead, directly copy the in-memory representations of the constituent `SpanGroup`s. * Update `SpanGroup.copy()` signature * Rename `new_doc` param to `doc` * Fix kwdarg * Update `.pyi` file and docstrings * `mypy` fix * Update spacy/tokens/span_group.pyx * Update docs Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-31 09:03:20 +02:00
Sofie Van Landeghem	8fc0efc502	Allow string argument for disable/enable/exclude (#11406 ) * adding unit test for spacy.load with disable/exclude string arg * allow pure strings in from_config * update docs * upstream type adjustements * docs update * make docstring more consistent * Update spacy/language.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * two more cleanups * fix type in internal method Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-31 09:02:34 +02:00
Daniël de Kok	3f4b4b7b4f	Fix `test_{prefer,require}_gpu` (#11390 ) * Fix `test_{prefer,require}_gpu` These tests assumed that GPUs are only supported with CuPy, but since Thinc 8.1 we also support Metal Performance Shaders. * test_misc: arrange thinc imports to be together	2022-08-30 14:21:02 +02:00
Patrick J. Burns	5ae63b1fbd	Add Latin language support (#11349 ) * Add lang folder for la (Latin) * Add Latin lang classes * Add minimal tokenizer exceptions * Add minimal stopwords * Add minimal lex_attrs * Update stopwords, tokenizer exceptions * Add la tests; register la_tokenizer in conftest.py * Update spacy/lang/la/lex_attrs.py Remove duplicate form in Latin lex_attrs Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update natto-py version spec (#11222) * Update natto-py version spec * Update setup.cfg Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Add scorer to textcat API docs config settings (#11263) * Update docs for pipeline initialize() methods (#11221) * Update documentation for dependency parser * Update documentation for trainable_lemmatizer * Update documentation for entity_linker * Update documentation for ner * Update documentation for morphologizer * Update documentation for senter * Update documentation for spancat * Update documentation for tagger * Update documentation for textcat * Update documentation for tok2vec * Run prettier on edited files * Apply similar changes in transformer docs * Remove need to say annotated example explicitly I removed the need to say "Must contain at least one annotated Example" because it's often a given that Examples will contain some gold-standard annotation. * Run prettier on transformer docs * chore: add 'concepCy' to spacy universe (#11255) * chore: add 'concepCy' to spacy universe * docs: add 'slogan' to concepCy * Support full prerelease versions in the compat table (#11228) * Support full prerelease versions in the compat table * Fix types * adding spans to doc_annotation in Example.to_dict (#11261) * adding spans to doc_annotation in Example.to_dict * to_dict compatible with from_dict: tuples instead of spans * use strings for label and kb_id * Simplify test * Update data formats docs Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix regex invalid escape sequences (#11276) * Add W605 to the errors raised by flake8 in the CI (#11283) * Clean up automated label-based issue handling (#11284) * Clean up automated label-based issue handline 1. upgrade tiangolo/issue-manager to latest 2. move needs-more-info to tiangolo 3. change needs-more-info close time to 7 days 4. delete old needs-more-info config * Use old, longer message * Fix label name * Fix Dutch noun chunks to skip overlapping spans (#11275) * Add test for overlapping noun chunks * Skip overlapping noun chunks * Update spacy/tests/lang/nl/test_noun_chunks.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Docs: displaCy documentation - data types, `parse_{deps,ents,spans}`, spans example (#10950) * add in spans example and parse references * rm autoformatter * rm extra ents copy * TypedDict draft * type fixes * restore non-documentation files * docs update * fix spans example * fix hyperlinks * add parse example * example fix + argument fix * fix api arg in docs * fix bad variable replacement * fix spacing in style Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * fix spacing on table * fix spacing on table * rm temp files Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * include span_ruler for default warning filter (#11333) * Add uk pipelines to website (#11332) * Check for . in factory names (#11336) * Make fixes for PR #11349 * Fix roman numeral coverage in #11349 Co-authored-by: Patrick J. Burns <patricks@diyclassics.org> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Lj Miranda <12949683+ljvmiranda921@users.noreply.github.com> Co-authored-by: Jules Belveze <32683010+JulesBelveze@users.noreply.github.com> Co-authored-by: stefawolf <wlf.ste@gmail.com> Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com> Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>	2022-08-30 14:04:54 +02:00
Paul O'Leary McCann	aafee5e1b7	Fix lookup usage in French/Catalan (fix #11347 ) (#11382 ) * Fix lookup usage (fix #11347) Before using the lookups table in the French (and Catalan) lemmatizers, there's a check to see if the current term is in the table. But it's checking a string against hashes, so it's always false. Also the table lookup function is designed so you don't have to do that anyway. * Use the lookup table directly * Use string, not token	2022-08-29 10:32:38 +02:00
Edward	6723d76f24	Add ConsoleLogger.v2 (#11214 ) * Init * Change logger to ConsoleLogger.v2 * adjust naming * More naming adjustments * Fix output_file reference error * ignore type * Add basic test for logger * Hopefully fix mypy issue * mypy ignore line * Update mypy line Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update test method name Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Change file saving logic * Fix finalize method * increase spacy-legacy version in requirements * Update docs * small adjustments Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-29 10:23:05 +02:00
Adriane Boyd	ba33200979	Remove pathy from pyproject.toml (#11383 )	2022-08-26 16:07:16 +02:00
Paul O'Leary McCann	7a2c58864c	Move deps outside explosion to "third-party" (#11381 )	2022-08-26 10:23:10 +02:00
Adriane Boyd	6fd3b4d9d6	Merge pull request #11375 from adrianeboyd/chore/update-develop-from-master-v3.5-1 Update develop from master for v3.5	2022-08-24 20:41:25 +02:00
Adriane Boyd	81874265e9	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.5-1	2022-08-24 12:47:42 +02:00
Tobius Saul	c09d2fa25b	luganda language extension (#10847 ) * luganda language extension * __init__.py changes * New enhancements * Lexical attribute changed * punctuaction and sentence additions * Remove comment header * Fix typos, reformat * reformated version * Add tokenizer test * Remove contractions from stop words * Format * Add Luganda to website Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-23 13:09:36 +02:00
Edward	5afa98aabf	Support custom attributes for tokens and spans in json conversion (#11125 ) * Add token and span custom attributes to to_json() * Change logic for to_json * Add functionality to from_json * Small adjustments * Move token/span attributes to new dict key * Fix test * Fix the same test but much better * Add backwards compatibility tests and adjust logic * Add test to check if attributes not set in underscore are not saved in the json * Add tests for json compatibility * Adjust test names * Fix tests and clean up code * Fix assert json tests * small adjustment * adjust naming and code readability * Adjust naming, added more tests and changed logic * Fix typo * Adjust errors, naming, and small test optimization * Fix byte tests * Fix bytes tests * Change naming and json structure * update schema * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/tokens/doc.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/tokens/doc.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/schemas.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update schema for underscore attributes * Adjust underscore schema * adjust schema tests Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-23 10:05:02 +02:00
Tal Zussman	7e75327893	Fix menu order in linguistic-features.md (#11364 ) Swap 'Vectors & Similarity' and 'Mappings & Exceptions' in menu to match order in body	2022-08-23 14:40:38 +09:00
Sofie Van Landeghem	6e20842370	dev docs: numeric comparators (#11334 ) * add section on numeric comparators * edit * prettier * Update extra/DEVELOPER_DOCS/Code Conventions.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * note on typing imports Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-22 15:52:53 +02:00
Adriane Boyd	f55bb7470d	Clean up warnings in the test suite (#11331 )	2022-08-22 12:04:30 +02:00
Paul O'Leary McCann	0f07defe2c	Remove reference to voting on issue (#11335 ) Not clear which issue this refers to, we don't suggest this for any other issues, and we don't use votes in general.	2022-08-22 11:29:05 +02:00
Adriane Boyd	04c6e5cb95	Improve floret vectors display in pipeline docs (#11343 )	2022-08-22 11:28:13 +02:00
Adriane Boyd	5fa8f4faca	Switch ru and uk lemmatizers to pymorphy3 (#11345 ) * Switch ru and uk lemmatizers to pymorphy3 * Switch to pymorphy3 in tests	2022-08-22 11:27:14 +02:00
Adriane Boyd	3e4cf1bbe1	Check for . in factory names (#11336 )	2022-08-19 09:52:12 +02:00
Adriane Boyd	09b3118b26	Add uk pipelines to website (#11332 )	2022-08-18 14:04:57 +02:00
Sofie Van Landeghem	cab263791f	include span_ruler for default warning filter (#11333 )	2022-08-17 19:55:54 +02:00
Peter Baumgartner	db7b9938a4	Docs: displaCy documentation - data types, `parse_{deps,ents,spans}`, spans example (#10950 ) * add in spans example and parse references * rm autoformatter * rm extra ents copy * TypedDict draft * type fixes * restore non-documentation files * docs update * fix spans example * fix hyperlinks * add parse example * example fix + argument fix * fix api arg in docs * fix bad variable replacement * fix spacing in style Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * fix spacing on table * fix spacing on table * rm temp files Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-08-16 11:23:34 -04:00
Adriane Boyd	ed4ad309e6	Fix Dutch noun chunks to skip overlapping spans (#11275 ) * Add test for overlapping noun chunks * Skip overlapping noun chunks * Update spacy/tests/lang/nl/test_noun_chunks.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2022-08-10 09:49:08 +02:00
Paul O'Leary McCann	231a17817d	Clean up automated label-based issue handling (#11284 ) * Clean up automated label-based issue handline 1. upgrade tiangolo/issue-manager to latest 2. move needs-more-info to tiangolo 3. change needs-more-info close time to 7 days 4. delete old needs-more-info config * Use old, longer message * Fix label name	2022-08-09 14:50:50 +02:00
Adriane Boyd	e700358ba0	Add W605 to the errors raised by flake8 in the CI (#11283 )	2022-08-09 12:15:13 +02:00
Adriane Boyd	fc4246558b	Fix regex invalid escape sequences (#11276 )	2022-08-09 10:59:36 +02:00
stefawolf	23749cfc91	adding spans to doc_annotation in Example.to_dict (#11261 ) * adding spans to doc_annotation in Example.to_dict * to_dict compatible with from_dict: tuples instead of spans * use strings for label and kb_id * Simplify test * Update data formats docs Co-authored-by: Stefanie Wolf <stefanie.wolf@vitecsoftware.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-05 12:26:38 +02:00
Luka Dragar	b64243ed55	Updates to Slovenian language (#11162 ) * Added examples for Slovene * Update spacy/lang/sl/examples.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Corrected a typo in one of the sentences * Updated support for Slovenian * Some minor changes to corrections * Added forint currency * Corrected HYPHENS_PERMITTED regex and some formatting * Minor changes * Un-xfail tokenizer test * Format Co-authored-by: Luka Dragar <D20124481@mytudublin.ie> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-08-05 10:10:18 +02:00
Adriane Boyd	b5d9d0897e	Merge pull request #11270 from adrianeboyd/chore/update-develop-v3.5 Prepare develop for v3.5	2022-08-04 21:17:26 +02:00
Adriane Boyd	a3f6d6bce1	Merge remote-tracking branch 'upstream/master' into develop	2022-08-04 18:19:28 +02:00
Adriane Boyd	b07708d5d0	Support full prerelease versions in the compat table (#11228 ) * Support full prerelease versions in the compat table * Fix types	2022-08-04 15:14:19 +02:00
Jules Belveze	cd09614ab2	chore: add 'concepCy' to spacy universe (#11255 ) * chore: add 'concepCy' to spacy universe * docs: add 'slogan' to concepCy	2022-08-04 15:42:38 +09:00
Lj Miranda	d993df41e5	Update docs for pipeline initialize() methods (#11221 ) * Update documentation for dependency parser * Update documentation for trainable_lemmatizer * Update documentation for entity_linker * Update documentation for ner * Update documentation for morphologizer * Update documentation for senter * Update documentation for spancat * Update documentation for tagger * Update documentation for textcat * Update documentation for tok2vec * Run prettier on edited files * Apply similar changes in transformer docs * Remove need to say annotated example explicitly I removed the need to say "Must contain at least one annotated Example" because it's often a given that Examples will contain some gold-standard annotation. * Run prettier on transformer docs	2022-08-03 16:53:02 +02:00
Adriane Boyd	d0578c2ede	Add scorer to textcat API docs config settings (#11263 )	2022-08-03 16:41:20 +02:00
Paul O'Leary McCann	2d89dd9db8	Update natto-py version spec (#11222 ) * Update natto-py version spec * Update setup.cfg Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2022-07-28 07:45:02 +02:00
ninjalu	95a1b8aca6	add additional REL_OP (#10371 ) * add additional REL_OP * change to condition and new rel_op symbols * add operators to docs * add the anchor while we're in here * add tests Co-authored-by: Peter Baumgartner <5107405+pmbaumgartner@users.noreply.github.com>	2022-07-27 13:16:44 +02:00
Madeesh Kannan	1829d7120a	`ExplosionBot`: Add note about case-sensitivity (#11211 )	2022-07-27 14:24:22 +09:00
Edward	360a702ecd	Add parent argument (#11210 )	2022-07-26 14:35:18 +02:00
Adriane Boyd	5c2a00cef0	Set version to v3.4.1 (#11209 )	2022-07-26 12:52:38 +02:00
Adriane Boyd	c8f5b752bb	Add link to developer docs code conventions (#11171 )	2022-07-26 10:56:53 +02:00
Daniël de Kok	4ee8a06149	Fix compatibility with CuPy 9.x (#11194 ) After the precomputable affine table of shape [nB, nF, nO, nP] is computed, padding with shape [1, nF, nO, nP] is assigned to the first row of the precomputed affine table. However, when we are indexing the precomputed table, we get a row of shape [nF, nO, nP]. CuPy versions before 10.0 cannot paper over this shape difference. This change fixes compatibility with CuPy < 10.0 by squeezing the first dimension of the padding before assignment.	2022-07-26 10:52:01 +02:00
Adriane Boyd	36ff2a5441	Merge pull request #11200 from adrianeboyd/chore/reenable-model-tests Revert "Temporarily skip tests that require models/compat"	2022-07-25 20:13:44 +02:00
Adriane Boyd	e5990db713	Revert "Temporarily skip tests that require models/compat" This reverts commit `d9320db7db`.	2022-07-25 18:12:18 +02:00
Paul O'Leary McCann	1c12812d1a	Replace link to old label (#11188 )	2022-07-25 16:39:34 +09:00
Adriane Boyd	7a99fe3c65	Move sent-patterns to correct section of universe.json (#11192 )	2022-07-25 09:14:50 +02:00
0xpeIpeI	93960dc4b5	[universe project] create English interpretation project (#11184 ) * [add] my universe project setting * [modify] A few adjustments * [Modify] change package description	2022-07-24 19:01:04 +09:00
Dan Radenkovic	a5aa3a818f	fix docs (#11123 )	2022-07-24 17:16:36 +09:00

... 7 8 9 10 11 ...

15998 Commits