spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-23 18:41:59 +03:00

Author	SHA1	Message	Date
Raphael Mitsch	49747697a2	Merge branch 'v4' into feature/docwise-generator-batching # Conflicts: # spacy/kb/kb.pyx # spacy/ml/models/entity_linker.py # spacy/pipeline/entity_linker.py # website/docs/api/inmemorylookupkb.mdx # website/docs/api/kb.mdx	2023-04-17 16:28:09 +02:00
Daniël de Kok	b734e5314d	Avoid `TrainablePipe.finish_update` getting called twice during training (#12450 ) * Avoid `TrainablePipe.finish_update` getting called twice during training PR #12136 fixed an issue where the tok2vec pipe was updated before gradient were accumulated. However, it introduced a new bug that cause `finish_update` to be called twice when using the training loop. This causes a fairly large slowdown. The `Language.update` method accepts the `sgd` argument for passing an optimizer. This argument has three possible values: - `Optimizer`: use the given optimizer to finish pipe updates. - `None`: use a default optimizer to finish pipe updates. - `False`: do not finish pipe updates. However, the latter option was not documented and not valid with the existing type of `sgd`. I assumed that this was a remnant of earlier spaCy versions and removed handling of `False`. However, with that change, we are passing `None` to `Language.update`. As a result, we were calling `finish_update` in both `Language.update` and in the training loop after all subbatches are processed. This change restores proper handling/use of `False`. Moreover, the role of `False` is now documented and added to the type to avoid future accidents. * Fix typo * Document defaults for `Language.update`	2023-03-30 09:30:42 +02:00
Edward	a653dec654	Add info that Vocab and StringStore are not static in docs (#12427 ) * Add size increase info about vocab and stringstore * Update website/docs/api/stringstore.mdx Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * Update website/docs/api/vocab.mdx Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com> * Change wording --------- Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>	2023-03-27 09:18:23 +02:00
Raphael Mitsch	3102e2e27a	Entity linking: use `SpanGroup` instead of `Iterable[Span]` for mentions (#12344 ) * Convert Candidate from Cython to Python class. * Format. * Fix .entity_ typo in _add_activations() usage. * Change type for mentions to look up entity candidates for to SpanGroup from Iterable[Span]. * Update docs. * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update doc string of BaseCandidate.__init__(). * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate. * Adjust Candidate to support and mandate numerical entity IDs. * Format. * Fix docstring and docs. * Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename alias -> mention. * Refactor Candidate attribute names. Update docs and tests accordingly. * Refacor Candidate attributes and their usage. * Format. * Fix mypy error. * Update error code in line with v4 convention. * Reverse erroneous changes during merge. * Update return type in EL tests. * Re-add Candidate to setup.py. * Format updated docs. --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-20 12:25:18 +01:00
Raphael Mitsch	e5be5d6092	Merge branch 'v4' into feature/docwise-generator-batching # Conflicts: # spacy/kb/kb.pyx # spacy/kb/kb_in_memory.pyx # spacy/ml/models/entity_linker.py # spacy/pipeline/entity_linker.py # spacy/tests/pipeline/test_entity_linker.py # website/docs/api/inmemorylookupkb.mdx # website/docs/api/kb.mdx	2023-03-20 10:50:54 +01:00
Raphael Mitsch	cb79af3a10	Fix merge leftovers.	2023-03-20 10:31:11 +01:00
Raphael Mitsch	73bdeb01e4	Merge branch 'refactor/el-candidates' into feature/docwise-generator-batching # Conflicts: # spacy/kb/candidate.py # spacy/kb/kb.pyx # spacy/kb/kb_in_memory.pyx # spacy/ml/models/entity_linker.py # spacy/pipeline/entity_linker.py # spacy/tests/pipeline/test_entity_linker.py # website/docs/api/inmemorylookupkb.mdx # website/docs/api/kb.mdx	2023-03-20 10:24:17 +01:00
Raphael Mitsch	9340eb8ad2	Introduce hierarchy for EL `Candidate` objects (#12341 ) * Convert Candidate from Cython to Python class. * Format. * Fix .entity_ typo in _add_activations() usage. * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update doc string of BaseCandidate.__init__(). * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate. * Adjust Candidate to support and mandate numerical entity IDs. * Format. * Fix docstring and docs. * Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename alias -> mention. * Refactor Candidate attribute names. Update docs and tests accordingly. * Refacor Candidate attributes and their usage. * Format. * Fix mypy error. * Update error code in line with v4 convention. * Update spacy/kb/candidate.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Updated error code. * Simplify interface for int/str representations. * Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Rename 'alias' to 'mention'. * Port Candidate and InMemoryCandidate to Cython. * Remove redundant entry in setup.py. * Add abstract class check. * Drop storing mention. * Update spacy/kb/candidate.pxd Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix entity_id refactoring problems in docstrings. * Drop unused InMemoryCandidate._entity_hash. * Update docstrings. * Move attributes out of Candidate. * Partially fix alias/mention terminology usage. Convert Candidate to interface. * Remove prior_prob from supported properties in Candidate. Introduce KnowledgeBase.supports_prior_probs(). * Update docstrings related to prior_prob. * Update alias/mention usage in doc(strings). * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs. * Update docstrings. * Fix InMemoryCandidate attribute names. * Update spacy/kb/kb.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/models/entity_linker.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update W401 test. * Update spacy/errors.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/kb/kb.pyx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Use Candidate output type for toy generators in the test suite to mimick best practices * fix docs * fix import --------- Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-20 00:34:35 +01:00
Adriane Boyd	6ae7618418	Clean up Vocab constructor (#12290 ) * Clean up Vocab constructor * Change effective type of `strings` from `Iterable[str]` to `Optional[StringStore]` * Don't automatically add strings to vocab * Change default values to `None` * Remove `*deprecated_kwargs` Format	2023-03-19 23:41:20 +01:00
Sofie Van Landeghem	0365d3d2e2	fix docs	2023-03-19 23:31:02 +01:00
Raphael Mitsch	3cfc1c6acc	Mention -> alias renaming. Drop Candidate.mentions(). Drop InMemoryLookupKB.get_alias_candidates() from docs.	2023-03-15 09:23:31 +01:00
Raphael Mitsch	28dbed64cb	Update alias/mention usage in doc(strings).	2023-03-14 13:33:05 +01:00
Raphael Mitsch	1ba2fc4207	Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-09 12:01:42 +01:00
Madeesh Kannan	520279ff7c	`Tok2Vec`: Add `distill` method (#12108 ) * `Tok2Vec`: Add `distill` method * `Tok2Vec`: Refactor `update` * Add `Tok2Vec.distill` test * Update `distill` signature to accept `Example`s instead of separate teacher and student docs * Add docs * Remove docstring * Update test * Remove `update` calls from test * Update `Tok2Vec.distill` docstring	2023-03-09 09:37:19 +01:00
Raphael Mitsch	8dbb74c9c0	Merge branch 'v4' into refactor/el-candidates	2023-03-07 09:06:51 +01:00
Raphael Mitsch	8b24f31b65	Format docs.	2023-03-06 10:21:37 +01:00
Raphael Mitsch	f33f0ed160	Merge branch 'v4' into feature/docwise-generator-batching # Conflicts: # spacy/pipeline/entity_linker.py # website/docs/api/entitylinker.mdx	2023-03-06 10:21:12 +01:00
Raphael Mitsch	e4e55b88b3	Update leftover get_candidates() mention in docs.	2023-03-06 10:08:10 +01:00
Raphael Mitsch	bb7418ebdd	Modify EL batching system.	2023-03-06 10:05:46 +01:00
Raphael Mitsch	94e57d0ed5	Refactor Candidate attribute names. Update docs and tests accordingly.	2023-03-03 11:08:17 +01:00
Raphael Mitsch	61bacf81bd	Update website/docs/api/kb.mdx Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2023-03-03 09:54:28 +01:00
Raphael Mitsch	3beda2b23a	Merge branch 'refactor/el-candidates' into refactor/span-group-for-mentions # Conflicts: # spacy/ml/models/entity_linker.py # website/docs/api/inmemorylookupkb.mdx	2023-03-03 08:32:38 +01:00
Raphael Mitsch	1ea31552be	Merge branch 'master' into sync/master-into-v4 # Conflicts: # requirements.txt # spacy/pipeline/entity_linker.py # spacy/util.py # website/docs/api/entitylinker.mdx	2023-03-02 16:24:15 +01:00
Raphael Mitsch	6aa6b86d49	Make generation of empty `KnowledgeBase` instances configurable in `EntityLinker` (#12320 ) * Make empty_kb() configurable. * Format. * Update docs. * Be more specific in KB serialization test. * Update KB serialization tests. Update docs. * Remove doc update for batched candidate generation. * Fix serialization of subclassed KB in tests. * Format. * Update docstring. * Update docstring. * Switch from pickle to json for custom field serialization.	2023-03-01 16:02:55 +01:00
Adriane Boyd	da75896ef5	Return Tuple[Span] for all Doc/Span attrs that provide spans (#12288 ) * Return Tuple[Span] for all Doc/Span attrs that provide spans * Update Span types	2023-03-01 16:00:02 +01:00
kadarakos	56aa0cc75f	Displacy doc fix (#12352 ) * more details for color setting * more details for color setting * prettier	2023-03-01 15:38:23 +01:00
Raphael Mitsch	9bd498cdae	Fix docstring and docs.	2023-03-01 15:09:24 +01:00
Raphael Mitsch	49abf4fb3a	Rename Candidate to InMemoryCandidate, BaseCandidate to Candidate.	2023-03-01 14:27:50 +01:00
Raphael Mitsch	efbc3d37b3	Update docs w.r.t. spacy.CandidateBatchGenerator.v1. (#12350 )	2023-03-01 11:01:35 +01:00
Adriane Boyd	33864f1d07	Add new tags in docs for #12334 (#12348 )	2023-03-01 10:46:13 +01:00
Raphael Mitsch	50b34751eb	Update docs.	2023-02-28 15:38:28 +01:00
TAN Long	071667376a	Add new REL_OPs: `>+`, `>-`, `<+`, and `<-` (#12334 ) * Add immediate left/right child/parent dependency relations * Add tests for new REL_OPs: `>+`, `>-`, `<+`, and `<-`. --------- Co-authored-by: Tan Long <tanloong@foxmail.com>	2023-02-28 14:36:33 +01:00
Adriane Boyd	4539fbae17	Revert "Fix FUZZY operator definition (#12318 )" (#12336 ) This reverts commit `daedc45d05`. The default length depends on the length of the pattern string and was correct for this example.	2023-02-27 09:48:36 +01:00
Adriane Boyd	df4c069a13	Remove backoff from .vector to .tensor (#12292 )	2023-02-23 11:36:50 +01:00
andyjessen	daedc45d05	Fix FUZZY operator definition (#12318 ) * Fix FUZZY operator definition The default length of the FUZZY operator is 2 and not 3. * adjust edit distance in matcher usage docs too --------- Co-authored-by: svlandeg <svlandeg@github.com>	2023-02-23 09:37:40 +01:00
Adriane Boyd	b95123060a	Make Span.char_span optional args keyword-only (#12257 ) * Make Span.char_span optional args keyword-only * Make kb_id and following kw-only * Format	2023-02-15 12:34:33 +01:00
Raphael Mitsch	2d4fb94ba0	Fix wrong file name in docs for rule-based matcher. (#12262 )	2023-02-09 12:58:14 +01:00
Adriane Boyd	cbc2ae933e	Remove unused Span.char_span(id=) (#12250 )	2023-02-08 14:46:07 +01:00
Adriane Boyd	cf85b81f34	Remove names for vectors (#12243 ) * Remove names for vectors Named vectors are basically a carry-over from v2 and aren't used for anything. * Format	2023-02-08 14:37:42 +01:00
Raphael Mitsch	d38a88f0f3	Remove negation. (#12252 )	2023-02-08 14:18:33 +01:00
Sofie Van Landeghem	c47ec5b5c6	Merge pull request #12218 from adrianeboyd/chore/update-v4-from-master-7 Update v4 from master	2023-02-03 12:04:20 +01:00
Paul O'Leary McCann	89f974d4f5	Cleanup/remove backwards compat overwrite settings (#11888 ) * Remove backwards-compatible overwrite from Entity Linker This also adds a docstring about overwrite, since it wasn't present. * Fix docstring * Remove backward compat settings in Morphologizer This also needed a docstring added. For this component it's less clear what the right overwrite settings are. * Remove backward compat from sentencizer This was simple * Remove backward compat from senter Another simple one * Remove backward compat setting from tagger * Add docstrings * Update spacy/pipeline/morphologizer.pyx Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update docs --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-02-02 14:13:38 +01:00
Adriane Boyd	cd95b29053	Merge remote-tracking branch 'upstream/master' into chore/update-v4-from-master-7	2023-02-02 13:06:15 +01:00
Sofie Van Landeghem	4c60afb946	Backslash fixes in docs (#12213 ) * backslash fixes * revert unrelated change	2023-02-01 10:15:38 +01:00
Edward	360ccf628a	Rename language codes (Icelandic, multi-language) (#12149 ) * Init * fix tests * Update spacy/errors.py Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix test_blank_languages * Rename xx to mul in docs * Format _util with black * prettier formatting --------- Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2023-01-31 17:30:43 +01:00
Daniël de Kok	6b07be2110	Add `Language.distill` (#12116 ) * Add `Language.distill` This method is the distillation counterpart of `Language.update`. It takes a teacher `Language` instance and distills the student pipes on the teacher pipes. * Apply suggestions from code review Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com> * Clarify that how Example is used in distillation * Update transition parser distill docstring for examples argument * Pass optimizer to `TrainablePipe.distill` * Annotate pipe before update As discussed internally, we want to let a pipe annotate before doing an update with gold/silver data. Otherwise, the output may be (too) informed by the gold/silver data. * Rename `component_map` to `student_to_teacher` * Better synopsis in `Language.distill` docstring * `name` -> `student_name` * Fix labels type in docstring * Mark distill test as slow * Fix `student_to_teacher` type in docs --------- Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>	2023-01-30 12:44:11 +01:00
Paul O'Leary McCann	8932f4dc35	Add extra flag to assets docs (#12194 ) * Add extra flag to assets docs For some reason this wasn't included. * Add new tag to docs	2023-01-30 10:05:23 +01:00
Adriane Boyd	ec45f704b1	Drop python 3.6/3.7, remove unneeded compat (#12187 ) * Drop python 3.6/3.7, remove unneeded compat * Remove unused import * Minimal python 3.8+ docs updates	2023-01-27 15:48:20 +01:00
Sofie Van Landeghem	bd739e67d6	explain KB change and how to remedy (#12189 )	2023-01-27 15:13:20 +01:00
Adriane Boyd	5f8a398bb9	Add span_id to Span.char_span, update Doc/Span.char_span docs (#12196 ) * Add span_id to Span.char_span, update Doc/Span.char_span docs `Span.char_span(id=)` should be removed in the future. * Also use Union[int, str] in Doc docstring	2023-01-27 15:09:17 +01:00

1 2 3 4 5 ...

3114 Commits