spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-23 04:26:46 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	ba6a37d358	Document Assigned Attributes of Pipeline Components (#9041 ) * Add textcat docs * Add NER docs * Add Entity Linker docs * Add assigned fields docs for the tagger This also adds a preamble, since there wasn't one. * Add morphologizer docs * Add dependency parser docs * Update entityrecognizer docs This is a little weird because `Doc.ents` is the only thing assigned to, but it's actually a bidirectional property. * Add token fields for entityrecognizer * Fix section name * Add entity ruler docs * Add lemmatizer docs * Add sentencizer/recognizer docs * Update website/docs/api/entityrecognizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/tagger.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/entityruler.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update type for Doc.ents This was `Tuple[Span, ...]` everywhere but `Tuple[Span]` seems to be correct. * Run prettier * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Run prettier * Add transformers section This basically just moves and renames the "custom attributes" section from the bottom of the page to be consistent with "assigned attributes" on other pages. I looked at moving the paragraph just above the section into the section, but it includes the unrelated registry additions, so it seemed better to leave it unchanged. * Make table header consistent Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-09-01 12:09:39 +02:00
Paul O'Leary McCann	f803a84571	Fix inference of epoch_resume (#9084 ) * Fix inference of epoch_resume When an epoch_resume value is not specified individually, it can often be inferred from the filename. The value inference code was there but the value wasn't passed back to the training loop. This also adds a specific error in the case where no epoch_resume value is provided and it can't be inferred from the filename. * Add new error * Always use the epoch resume value if specified Before this the value in the filename was used if found	2021-09-01 14:17:42 +09:00
Sofie Van Landeghem	a17b06d18b	allow typer 0.4 (#9089 )	2021-08-31 20:53:51 +10:00
Davide Fiocco	1dd69be1f1	Fix point typo on docbin docs (#9097 )	2021-08-31 10:55:44 +02:00
Ines Montani	1a86d545af	Update references to contributor agreement [ci skip]	2021-08-31 10:03:38 +10:00
Sofie Van Landeghem	5af88427a2	Dev docs: listeners (#9061 ) * Start Listeners documentation * intro tabel of different architectures * initialization, linking, dim inference * internal comm (WIP) * expand internal comm section * frozen components and replacing listeners * various small fixes * fix content table * fix link	2021-08-30 14:56:35 +02:00
Adriane Boyd	1e9b4b55ee	Pass overrides to subcommands in workflows (#9059 ) * Pass overrides to subcommands in workflows * Add missing docstring	2021-08-30 09:23:54 +02:00
Paul O'Leary McCann	6ff8d90070	Merge pull request #9081 from mjhajharia/patch-1 benepar usage example has deprecated imports	2021-08-29 14:41:52 +09:00
Meenal Jhajharia	2613f0e98f	benepar usage example has deprecated imports	2021-08-28 16:35:58 +05:30
Sofie Van Landeghem	1e974de837	config is not Optional (#9024 )	2021-08-27 11:44:31 +02:00
github-actions[bot]	fb9c31fbda	Auto-format code with black (#9065 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-27 11:42:27 +02:00
Sofie Van Landeghem	4d39430b82	Document use-case of freezing tok2vec (#8992 ) * update error msg * add sentence to docs * expand note on frozen components	2021-08-26 09:50:35 +02:00
Sofie Van Landeghem	94fb840443	fix docs for Span constructor arguments (#9023 )	2021-08-25 16:06:22 +02:00
David Strouk	31e9b126a0	Fix verbs list in lang/fr/tokenizer_exceptions.py (#9033 )	2021-08-25 15:55:09 +02:00
Ines Montani	4cd052e81d	Include component factories in third-party dependencies resolver (#9009 ) * Include component factories in third-party dependencies resolver * Increment catalogue and update test	2021-08-25 14:58:01 +02:00
Sofie Van Landeghem	e1f88de729	bump to 3.1.2 (#9008 )	2021-08-20 12:41:09 +02:00
Sofie Van Landeghem	4d52d7051c	Fix spancat training on nested entities (#9007 ) * overfitting test on non-overlapping entities * add failing overfitting test for overlapping entities * failing test for list comprehension * remove test that was put in separate PR * bugfix * cleanup	2021-08-20 12:37:50 +02:00
Paul O'Leary McCann	9cc3dc2b67	Add glossary entry for _SP (#8983 )	2021-08-20 12:04:02 +02:00
Sofie Van Landeghem	de025beb5f	Warn and document spangroup.doc weakref (#8980 ) * test for error after Doc has been garbage collected * warn about using a SpanGroup when the Doc has been garbage collected * add warning to the docs * rephrase slightly * raise error instead of warning * update * move warning to doc property	2021-08-20 11:06:19 +02:00
Paul O'Leary McCann	37fe847af4	Fix type annotation in docs	2021-08-20 15:34:22 +09:00
Ines Montani	f2b61b77a5	Fix universe.json [ci skip]	2021-08-20 11:26:29 +10:00
Ines Montani	894e16f5ca	Merge pull request #9003 from bbieniek/add-spacy-api-v3 [ci skip]	2021-08-20 11:23:30 +10:00
Baltazar	4d85cb88a5	added contribution license	2021-08-19 21:45:18 +02:00
Baltazar	71e65fe943	added spacy api v3 docker	2021-08-19 21:29:25 +02:00
Adriane Boyd	6722dc3dc5	Fix allow_overlap default for spancat scoring (#8970 ) * Remove irrelevant default options	2021-08-18 09:56:56 +02:00
Steele Farnsworth	b18cb1cd2a	Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956 ) * Refactor to use list comps and enumerate. Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal. * Undo double assignment. Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements. Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Sign contributors agreement Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-18 09:55:45 +02:00
Ines Montani	d94ddd5686	Auto-detect package dependencies in spacy package (#8948 ) * Auto-detect package dependencies in spacy package * Add simple get_third_party_dependencies test * Import packages_distributions explicitly * Inline packages_distributions * Fix docstring [ci skip] * Relax catalogue requirement * Move importlib_metadata to spacy.compat with note * Include license information [ci skip]	2021-08-17 14:05:13 +02:00
Sofie Van Landeghem	0a6b68848f	Fix making span_group (#8975 ) * fix _make_span_group * fix imports	2021-08-17 10:36:34 +02:00
Ines Montani	593a22cf2d	Add development docs for Language and code conventions (#8745 ) * WIP: add dev docs for Language / config [ci skip] * Add section on initialization [ci skip] * Fix wording [ci skip] * Add code conventions WIP [ci skip] * Update code convention docs [ci skip] * Update contributing guide and conventions [ci skip] * Update Code Conventions.md [ci skip] * Clarify sourced components + vectors * Apply suggestions from code review [ci skip] Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update wording and add link [ci skip] * restructure slightly + extended index * remove paragraph that breaks flow and is repeated in more detail later * fix anchors Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-08-17 09:38:15 +02:00
Paul O'Leary McCann	9391998c77	Add notes on preparing training data to docs (#8964 ) * Add training data section Not entirely sure this is in the right location on the page - maybe it should be after quickstart? * Add pointer from binary format to training data section * Minor cleanup * Add to ToC, fix filename * Update website/docs/usage/training.md Co-authored-by: Ines Montani <ines@ines.io> * Update website/docs/usage/training.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/usage/training.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Move the training data section further down the page * Update website/docs/usage/training.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update website/docs/usage/training.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Run prettier Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-16 17:37:21 +02:00
Ines Montani	a894fe0440	Merge pull request #8951 from HLasse/master	2021-08-16 11:41:32 +10:00
Lasse	839ea0f987	change tags formatting to match	2021-08-13 14:40:08 +02:00
Lasse	70ab596f61	Merge branch 'master' of https://github.com/HLasse/spaCy	2021-08-13 14:35:21 +02:00
Lasse	195e4e48c3	add textdescriptives to universe	2021-08-13 14:35:18 +02:00
github-actions[bot]	92071326d8	Auto-format code with black (#8950 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-13 11:48:38 +02:00
Adriane Boyd	8448c7dbc5	Update da trf recommendation (#8921 ) Update the da trf recommendation to the same model used in the pretrained pipelines.	2021-08-12 13:54:02 +02:00
Ines Montani	6260f044cc	Merge pull request #8938 from explosion/docs/prodigy-v1-11-project [ci skip] Update Prodigy project template for v1.11	2021-08-12 21:16:49 +10:00
Ines Montani	4f769ff913	Update Prodigy project template for v1.11 [ci skip]	2021-08-12 13:46:20 +10:00
Paul O'Leary McCann	e227d24d43	Allow passing in array vars for speedup (#8882 ) * Allow passing in array vars for speedup This fixes #8845. Not sure about the docstring changes here... * Update docs Types maybe need more detail? Maybe not? * Run prettier on docs * Update spacy/tokens/span.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-08-10 15:13:53 +02:00
Paul O'Leary McCann	6029cfc391	Add scores to output in spancat (#8855 ) * Add scores to output in spancat This exposes the scores as an attribute on the SpanGroup. Includes a basic test. * Add basic doc note * Vectorize score calcs * Add "annotation format" section * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Clean up doc section * Ran prettier on docs * Get arrays off the gpu before iterating over them * Remove int() calls Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-08-10 13:47:49 +02:00
Ines Montani	a1e9f19460	Merge pull request #8910 from DuyguA/patch-1 [ci skip] updated unv json for new book	2021-08-09 23:12:50 +10:00
Duygu Altinok	380b2817cf	updated unv json for new book	2021-08-09 12:39:22 +02:00
Paul O'Leary McCann	cac298471f	Fix #8902 (bad link in docs) typo fix	2021-08-08 22:04:00 +09:00
Eduard Zorita	439f30faad	Add stub files for main cython classes (#8427 ) * Add stub files for main API classes * Add contributor agreement for ezorita * Update types for ndarray and hash() * Fix __getitem__ and __iter__ * Add attributes of Doc and Token classes * Overload type hints for Span.__getitem__ * Fix type hint overload for Span.__getitem__ Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>	2021-08-07 12:30:03 +02:00
github-actions[bot]	56d4d87aeb	Auto-format code with black (#8895 ) Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>	2021-08-06 13:38:06 +02:00
Kabir Khan	1dfffe5fb4	No output info message in train (#8885 ) * Add info message that no output directory was provided in train * Update train.py * Fix logging	2021-08-05 09:21:22 +02:00
Adriane Boyd	fa2e7a4bbf	Fix spancat tests on GPU (#8872 ) * Fix spancat tests on GPU * Fix more spancat tests	2021-08-04 14:29:43 +02:00
Paul O'Leary McCann	77d698dcae	Fix check for RIGHT_ATTRS in dep matcher (#8807 ) * Fix check for RIGHT_ATTRs in dep matcher If a non-anchor node does not have RIGHT_ATTRS, the dep matcher throws an E100, which says that non-anchor nodes must have LEFT_ID, REL_OP, and RIGHT_ID. It specifically does not say RIGHT_ATTRS is required. A blank RIGHT_ATTRS is also valid, and patterns with one will be excepted. While not normal, sometimes a REL_OP is enough to specify a non-anchor node - maybe you just want the head of another node unconditionally, for example. This change just sets RIGHT_ATTRS to {} if not present. Alternatively changing E100 to state RIGHT_ATTRS is required could also be reasonable. * Fix test This test was written on the assumption that if `RIGHT_ATTRS` isn't present an error will be raised. Since the proposed changes make it so an error won't be raised this is no longer necessary. * Revert test, update error message Error message now lists missing keys, and RIGHT_ATTRS is required. * Use list of required keys in error message Also removes unused key param arg.	2021-08-04 09:20:41 +02:00
Adriane Boyd	941a591f3c	Pass excludes when serializing vocab (#8824 ) * Pass excludes when serializing vocab Additional minor bug fix: * Deserialize vocab in `EntityLinker.from_disk` * Add test for excluding strings on load * Fix formatting	2021-08-03 14:42:44 +02:00
Adriane Boyd	175847f92c	Support list values and INTERSECTS in Matcher (#8784 ) * Support list values and IS_INTERSECT in Matcher * Support list values as token attributes for set operators, not just as pattern values. * Add `IS_INTERSECT` operator. * Fix incorrect `ISSUBSET` and `ISSUPERSET` in schema and docs. * Rename IS_INTERSECT to INTERSECTS	2021-08-02 19:39:26 +02:00

1 2 3 4 5 ...

14794 Commits