spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-03 19:33:19 +03:00

Author	SHA1	Message	Date
Adriane Boyd	a803af9dfa	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-1	2021-10-26 11:53:50 +02:00
Daniël de Kok	f31ac6fd4f	Print a warning when multiprocessing is used on a GPU (#9475 ) * Raise an error when multiprocessing is used on a GPU As reported in #5507, a confusing exception is thrown when multiprocessing is used with a GPU model and the `fork` multiprocessing start method: cupy.cuda.runtime.CUDARuntimeError: cudaErrorInitializationError: initialization error This change checks whether one of the models uses the GPU when multiprocessing is used. If so, raise a friendly error message. Even though multiprocessing can work on a GPU with the `spawn` method, it quickly runs the GPU out-of-memory on real-world data. Also, multiprocessing on a single GPU typically does not provide large performance gains. * Move GPU multiprocessing check to Language.pipe * Warn rather than error when using multiprocessing with GPU models * Improve GPU multiprocessing warning message. Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Reduce API assumptions Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update spacy/language.py * Update spacy/language.py * Test that warning is thrown with GPU + multiprocessing Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-10-21 16:14:23 +02:00
Sofie Van Landeghem	5a38f79f18	Custom component types in spacy.ty (#9469 ) * add custom protocols in spacy.ty * add a test for the new types in spacy.ty * import Example when type checking * some type fixes * put Protocol in compat * revert update check back to hasattr * runtime_checkable in compat as well	2021-10-21 15:31:06 +02:00
Connor Brinton	657af5f91f	🏷 Add Mypy check to CI and ignore all existing Mypy errors (#9167 ) * 🚨 Ignore all existing Mypy errors * 🏗 Add Mypy check to CI * Add types-mock and types-requests as dev requirements * Add additional type ignore directives * Add types packages to dev-only list in reqs test * Add types-dataclasses for python 3.6 * Add ignore to pretrain * 🏷 Improve type annotation on `run_command` helper The `run_command` helper previously declared that it returned an `Optional[subprocess.CompletedProcess]`, but it isn't actually possible for the function to return `None`. These changes modify the type annotation of the `run_command` helper and remove all now-unnecessary `# type: ignore` directives. * 🔧 Allow variable type redefinition in limited contexts These changes modify how Mypy is configured to allow variables to have their type automatically redefined under certain conditions. The Mypy documentation contains the following example: ```python def process(items: List[str]) -> None: # 'items' has type List[str] items = [item.split() for item in items] # 'items' now has type List[List[str]] ... ``` This configuration change is especially helpful in reducing the number of `# type: ignore` directives needed to handle the common pattern of: * Accepting a filepath as a string * Overwriting the variable using `filepath = ensure_path(filepath)` These changes enable redefinition and remove all `# type: ignore` directives rendered redundant by this change. * 🏷 Add type annotation to converters mapping * 🚨 Fix Mypy error in convert CLI argument verification * 🏷 Improve type annotation on `resolve_dot_names` helper * 🏷 Add type annotations for `Vocab` attributes `strings` and `vectors` * 🏷 Add type annotations for more `Vocab` attributes * 🏷 Add loose type annotation for gold data compilation * 🏷 Improve `_format_labels` type annotation * 🏷 Fix `get_lang_class` type annotation * 🏷 Loosen return type of `Language.evaluate` * 🏷 Don't accept `Scorer` in `handle_scores_per_type` * 🏷 Add `string_to_list` overloads * 🏷 Fix non-Optional command-line options * 🙈 Ignore redefinition of `wandb_logger` in `loggers.py` * ➕ Install `typing_extensions` in Python 3.8+ The `typing_extensions` package states that it should be used when "writing code that must be compatible with multiple Python versions". Since SpaCy needs to support multiple Python versions, it should be used when newer `typing` module members are required. One example of this is `Literal`, which is available starting with Python 3.8. Previously SpaCy tried to import `Literal` from `typing`, falling back to `typing_extensions` if the import failed. However, Mypy doesn't seem to be able to understand what `Literal` means when the initial import means. Therefore, these changes modify how `compat` imports `Literal` by always importing it from `typing_extensions`. These changes also modify how `typing_extensions` is installed, so that it is a requirement for all Python versions, including those greater than or equal to 3.8. * 🏷 Improve type annotation for `Language.pipe` These changes add a missing overload variant to the type signature of `Language.pipe`. Additionally, the type signature is enhanced to allow type checkers to differentiate between the two overload variants based on the `as_tuple` parameter. Fixes #8772 * ➖ Don't install `typing-extensions` in Python 3.8+ After more detailed analysis of how to implement Python version-specific type annotations using SpaCy, it has been determined that by branching on a comparison against `sys.version_info` can be statically analyzed by Mypy well enough to enable us to conditionally use `typing_extensions.Literal`. This means that we no longer need to install `typing_extensions` for Python versions greater than or equal to 3.8! 🎉 These changes revert previous changes installing `typing-extensions` regardless of Python version and modify how we import the `Literal` type to ensure that Mypy treats it properly. * resolve mypy errors for Strict pydantic types * refactor code to avoid missing return statement * fix types of convert CLI command * avoid list-set confustion in debug_data * fix typo and formatting * small fixes to avoid type ignores * fix types in profile CLI command and make it more efficient * type fixes in projects CLI * put one ignore back * type fixes for render * fix render types - the sequel * fix BaseDefault in language definitions * fix type of noun_chunks iterator - yields tuple instead of span * fix types in language-specific modules * 🏷 Expand accepted inputs of `get_string_id` `get_string_id` accepts either a string (in which case it returns its ID) or an ID (in which case it immediately returns the ID). These changes extend the type annotation of `get_string_id` to indicate that it can accept either strings or IDs. * 🏷 Handle override types in `combine_score_weights` The `combine_score_weights` function allows users to pass an `overrides` mapping to override data extracted from the `weights` argument. Since it allows `Optional` dictionary values, the return value may also include `Optional` dictionary values. These changes update the type annotations for `combine_score_weights` to reflect this fact. * 🏷 Fix tokenizer serialization method signatures in `DummyTokenizer` * 🏷 Fix redefinition of `wandb_logger` These changes fix the redefinition of `wandb_logger` by giving a separate name to each `WandbLogger` version. For backwards-compatibility, `spacy.train` still exports `wandb_logger_v3` as `wandb_logger` for now. * more fixes for typing in language * type fixes in model definitions * 🏷 Annotate `_RandomWords.probs` as `NDArray` * 🏷 Annotate `tok2vec` layers to help Mypy * 🐛 Fix `_RandomWords.probs` type annotations for Python 3.6 Also remove an import that I forgot to move to the top of the module 😅 * more fixes for matchers and other pipeline components * quick fix for entity linker * fixing types for spancat, textcat, etc * bugfix for tok2vec * type annotations for scorer * add runtime_checkable for Protocol * type and import fixes in tests * mypy fixes for training utilities * few fixes in util * fix import * 🐵 Remove unused `# type: ignore` directives * 🏷 Annotate `Language._components` * 🏷 Annotate `spacy.pipeline.Pipe` * add doc as property to span.pyi * small fixes and cleanup * explicit type annotations instead of via comment Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: svlandeg <svlandeg@github.com>	2021-10-14 15:21:40 +02:00
Adriane Boyd	d98d525bc8	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.1-3	2021-10-14 09:41:46 +02:00
Elia Robyn Lake (Robyn Speer)	53b5f245ed	Allow IETF language codes, aliases, and close matches (#9342 ) * use language-matching to allow language code aliases Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * link to "IETF language tags" in docs Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * Make requirements consistent Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * change "two-letter language ID" to "IETF language tag" in language docs Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * use langcodes 3.2 and handle language-tag errors better Signed-off-by: Elia Robyn Speer <elia@explosion.ai> * all unknown language codes are ImportErrors Signed-off-by: Elia Robyn Speer <elia@explosion.ai> Co-authored-by: Elia Robyn Speer <elia@explosion.ai>	2021-10-05 09:52:22 +02:00
Adriane Boyd	4192e71599	Sync vocab in vectors and components sourced in configs (#9335 ) Since a component may reference anything in the vocab, share the full vocab when loading source components and vectors (which will include `strings` as of #8909). When loading a source component from a config, save and restore the vocab state after loading source pipelines, in particular to preserve the original state without vectors, since `[initialize.vectors] = null` skips rather than resets the vectors. The vocab references are not synced for components loaded with `Language.add_pipe(source=)` because the pipelines are already loaded and not necessarily with the same vocab. A warning could be added in `Language.create_pipe_from_source` that it may be necessary to save and reload before training, but it's a rare enough case that this kind of warning may be too noisy overall.	2021-10-04 12:19:02 +02:00
Adriane Boyd	e750c1760c	Restore tokenization timing in Language.evaluate (#9305 ) Restore tokenization timing steps that were accidentally removed in #6765.	2021-09-27 20:44:14 +02:00
Adriane Boyd	03f234b739	Merge remote-tracking branch 'upstream/master' into develop	2021-09-27 09:10:45 +02:00
Adriane Boyd	2f0bb77920	Accept Doc input in pipelines (#9069 ) * Accept Doc input in pipelines Allow `Doc` input to `Language.__call__` and `Language.pipe`, which skips `Language.make_doc` and passes the doc directly to the pipeline. * ensure_doc helper function * avoid running multiple processes on GPU * Update spacy/tests/test_language.py Co-authored-by: svlandeg <svlandeg@github.com>	2021-09-22 09:41:05 +02:00
Sofie Van Landeghem	1e974de837	config is not Optional (#9024 )	2021-08-27 11:44:31 +02:00
Ines Montani	d94ddd5686	Auto-detect package dependencies in spacy package (#8948 ) * Auto-detect package dependencies in spacy package * Add simple get_third_party_dependencies test * Import packages_distributions explicitly * Inline packages_distributions * Fix docstring [ci skip] * Relax catalogue requirement * Move importlib_metadata to spacy.compat with note * Include license information [ci skip]	2021-08-17 14:05:13 +02:00
Adriane Boyd	941a591f3c	Pass excludes when serializing vocab (#8824 ) * Pass excludes when serializing vocab Additional minor bug fix: * Deserialize vocab in `EntityLinker.from_disk` * Add test for excluding strings on load * Fix formatting	2021-08-03 14:42:44 +02:00
Adriane Boyd	9ad3b8cf8d	Only add sourced vectors hashes to meta if necessary (#8830 )	2021-08-02 18:22:35 +02:00
Adriane Boyd	e532c69475	Update Language.replace_pipe for disabled components (#8729 ) * Fix the index where the replacement in inserted to account for disabled components * Allow `Language.replace_pipe` to replace disabled components	2021-07-19 18:06:12 +10:00
Ines Montani	f90482d077	Tidy up and auto-format	2021-07-18 15:44:56 +10:00
explosion-bot	334f1f98d8	Auto-format code with black	2021-07-09 08:06:06 +00:00
Luca Dorigo	e8ef4a46d5	Add the right return type for Language.pipe and an overload for the as_tuples case (#8441 ) * Add the right return type for Language.pipe and an overload for the as_tuples version * Reformat, tidy up Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-06 14:18:40 +02:00
Adriane Boyd	5fd0b5207e	Fix vectors check for sourced components (#8559 ) * Fix vectors check for sourced components Since vectors are not loaded when components are sourced, store a hash for the vectors of each sourced component and compare it to the loaded vectors after the vectors are loaded from the `[initialize]` block. * Pop temporary info * Remove stored hash in remove_pipe * Add default for pop * Add additional convert/debug/assemble CLI tests	2021-07-06 12:43:17 +02:00
Adriane Boyd	86d01e9229	Tidy up with flake8: imports, comparisons, etc.	2021-06-28 12:08:15 +02:00
Adriane Boyd	5eeb25f043	Tidy up code	2021-06-28 12:08:15 +02:00
Adriane Boyd	7abfa25035	Don't use the same vocab for source models (#8388 ) * Don't use the same vocab for source models The source models should not be loaded with the vocab from the current pipeline because this loads the vectors from the source model into the current vocab. The strings are all copied in `Language.create_pipe_from_source`, so if the vectors are configured correctly in the current pipeline, the sourced component will work as expected. If there is a vector mismatch, a warning is shown. (It's not possible to inspect whether the vectors are actually used by the component, so a warning is the best option.) * Update comment on source model loading	2021-06-21 09:33:33 +02:00
Adriane Boyd	5646fcbe46	Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1	2021-06-15 15:05:17 +02:00
Adriane Boyd	9dfd3c9484	Use warnings.warn instead of logger.warning	2021-06-04 17:44:08 +02:00
Narayan Acharya	6b79714080	Address missing config overrides post load of models (#8208 )	2021-05-31 18:36:52 +10:00
Ines Montani	5957ab74f7	Merge pull request #8112 from svlandeg/bugfix/replace-trf	2021-05-28 11:35:17 +10:00
Adriane Boyd	b120fb3511	Handle errors while multiprocessing (#8004 ) * Handle errors while multiprocessing Handle errors while multiprocessing without hanging. * Return the traceback for errors raised while processing a batch, which can be handled by the top-level error handler * Allow for shortened batches due to custom error handlers that ignore errors and skip documents * Define custom components at a higher level * Also move up custom error handler * Use simpler component for test * Switch error type * Adjust test * Only call top-level error handler for exceptions * Register custom test components within tests Use global functions (so they can be pickled) but register the components only within the individual tests.	2021-05-17 13:28:39 +02:00
svlandeg	235e9f5488	call replace_listener_cfg attr if it's available	2021-05-12 17:19:38 +02:00
svlandeg	44a3a58599	call replace_listener attr if it's available	2021-05-12 16:01:02 +02:00
Santiago Castro	e99ff6f255	Fix typo in Language docstrings (#7958 )	2021-05-03 14:44:09 +02:00
Adriane Boyd	95c0833656	Add training option to set annotations on update (#7767 ) * Add training option to set annotations on update Add a `[training]` option called `set_annotations_on_update` to specify a list of components for which the predicted annotations should be set on `example.predicted` immediately after that component has been updated. The predicted annotations can be accessed by later components in the pipeline during the processing of the batch in the same `update` call. * Rename to annotates / annotating_components * Add test for `annotating_components` when training from config * Add documentation	2021-04-26 16:53:53 +02:00
Adriane Boyd	1ad646cbcf	Improve checks for sourced components (#7490 ) * Improve checks for sourced components * Remove language class checks * Convert python warning to logger warning * Remove unused warning * Fix formatting	2021-04-19 18:36:32 +10:00
Adriane Boyd	82d3caf861	Implement replace_listeners for source in config (#7620 ) Implement replace_listeners for sourced components loaded from a config.	2021-04-08 18:21:22 +10:00
Paul O'Leary McCann	40bc01e668	Proactively remove unused listeners With this the changes in initialize.py might be unecessary. Requires testing.	2021-03-17 22:41:41 +09:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Sofie Van Landeghem	39de3602e0	return custom error in nlp.initialize (#7104 ) * return custom error in nlp.initialize * Rename error Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:01:31 +11:00
Sofie Van Landeghem	cd70c3cb79	Fixing pretrain (#7342 ) * initialize NLP with train corpus * add more pretraining tests * more tests * function to fetch tok2vec layer for pretraining * clarify parameter name * test different objectives * formatting * fix check for static vectors when using vectors objective * clarify docs * logger statement * fix init_tok2vec and proc.initialize order * test training after pretraining * add init_config tests for pretraining * pop pretraining block to avoid config validation errors * custom errors	2021-03-09 14:01:13 +11:00
Adriane Boyd	e43d43db32	Allow sourcing disabled components (#7215 ) Check `component_names` instead of `pipe_names` to allow sourcing disabled components.	2021-02-26 13:50:56 +01:00
Sofie Van Landeghem	f638306598	remove link_components flag again (#6883 )	2021-02-02 10:08:40 +08:00
Sofie Van Landeghem	acabb284dd	Fix linking resumed components (#6859 ) * link components across enabled, resumed and frozen * revert renaming * revert renaming, the sequel	2021-02-01 22:19:58 +11:00
Ines Montani	d0c3775712	Replace links to nightly docs [ci skip]	2021-01-30 20:09:38 +11:00
Ines Montani	e6accb3a9e	Tidy up and auto-format	2021-01-30 12:52:33 +11:00
Ines Montani	7886d59c56	Add check for remove_listener method	2021-01-29 23:47:30 +11:00
Ines Montani	94232aea08	Improve E889	2021-01-29 23:39:23 +11:00
Ines Montani	e766e8c56d	Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-01-29 21:41:17 +11:00
Ines Montani	325f47500d	Move replacement logic to Language.from_config	2021-01-29 19:37:04 +11:00
Ines Montani	99842387cb	Remove default value	2021-01-29 18:45:37 +11:00
Ines Montani	44b5542d14	Change method order	2021-01-29 18:42:41 +11:00
Ines Montani	8c15d1daec	Update and validate config first and exit early if paths don't exist	2021-01-29 18:24:47 +11:00
Ines Montani	bbb94b37c6	Update error handling and docstring	2021-01-29 16:27:49 +11:00
Ines Montani	01ecfbcc45	Merge branch 'develop' into feature/replace-listeners	2021-01-29 15:57:32 +11:00
Ines Montani	911dfcccfc	Add option to replace listeners for sourced components	2021-01-29 15:57:04 +11:00
Sofie Van Landeghem	837a4f53c2	Error handling in nlp.pipe (#6817 ) * add error handler for pipe methods * add unit tests * remove pipe method that are the same as their base class * have Language keep track of a default error handler * cleanup * formatting * small refactor * add documentation	2021-01-29 08:51:21 +08:00
Ines Montani	fabd3a3394	Tidy up code comments [ci skip]	2021-01-27 12:40:03 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Matthew Honnibal	88acbfc050	Copy the Example objects (and their predicted Doc) in nlp.evaluate() and nlp.update() (#6765 ) * Make copy of examples in nlp.update and nlp.evaluate * Avoid circular import * Fix evaluate	2021-01-19 16:47:44 +01:00
Sofie Van Landeghem	bfc212e68f	fix duplicate from merge [ci skip]	2021-01-19 12:14:35 +01:00
Adriane Boyd	c8b4370865	Add all strings from source models (#6736 ) Add all strings from the source model when adding a pipe from a source model. Minor: * Skip `disable=["vocab", "tokenizer"]` when loading a source model from the config, since this doesn't do anything and is misleading.	2021-01-16 12:26:15 +11:00
Adriane Boyd	a45d89f09a	Add initialize.before_init and after_init callbacks Add `initialize.before_init` and `initialize.after_init` callbacks to the config. The `initialize.before_init` callback is a place to implement one-time tokenizer customizations that are then saved with the model.	2021-01-12 13:07:44 +01:00
Adriane Boyd	b57be94c78	Fix memory issues in Language.evaluate (#6386 ) * Fix memory issues in Language.evaluate Reset annotation in predicted docs before evaluating and store all data in `examples`. * Minor refactor to docs generator init * Fix generator expression * Fix final generator check * Refactor pipeline loop * Handle examples generator in Language.evaluate * Add test with generator * Use make_doc	2020-12-31 10:45:50 +11:00
Adriane Boyd	6ee6e41234	Update docstring for Language.evaluate	2020-12-09 10:21:39 +01:00
Adriane Boyd	fa8fa474a3	Add nlp.batch_size setting Add a default `batch_size` setting for `Language.pipe` and `Language.evaluate` as `nlp.batch_size`.	2020-12-09 09:13:26 +01:00
Ines Montani	1980203229	Merge branch 'master' into pr/6444	2020-12-09 11:09:40 +11:00
Adriane Boyd	e931d3f72b	Move max_length to nlp.make_doc() (#6512 ) Move max_length check to `nlp.make_doc()` so that's it's also checked for `nlp.pipe()`.	2020-12-08 14:24:02 +08:00
Ines Montani	539b0c10da	Tidy up and auto-format	2020-10-10 19:14:48 +02:00
svlandeg	8316bc7d4a	bugfix DisabledPipes	2020-10-09 12:06:20 +02:00
Sofie Van Landeghem	d093d6343b	TrainablePipe (#6213 ) * rename Pipe to TrainablePipe * split functionality between Pipe and TrainablePipe * remove unnecessary methods from certain components * cleanup * hasattr(component, "pipe") should be sufficient again * remove serialization and vocab/cfg from Pipe * unify _ensure_examples and validate_examples * small fixes * hasattr checks for self.cfg and self.vocab * make is_resizable and is_trainable properties * serialize strings.json instead of vocab * fix KB IO + tests * fix typos * more typos * _added_strings as a set * few more tests specifically for _added_strings field * bump to 3.0.0a36	2020-10-08 21:33:49 +02:00
svlandeg	3e2e1fd323	cleanup	2020-10-08 10:37:32 +02:00
svlandeg	eaf5c265cb	set_kb method for entity_linker	2020-10-08 10:34:01 +02:00
svlandeg	6b8bdb2d39	add init_config to nlp.create_pipe	2020-10-07 14:58:16 +02:00
svlandeg	ff9ac39c88	read entity_ruler patterns with srsly.read_jsonl.v1	2020-10-05 22:50:14 +02:00
svlandeg	4e3ace4b8c	is_trainable method	2020-10-05 17:43:42 +02:00
svlandeg	65abd77779	add finish_update to Pipe	2020-10-05 16:23:33 +02:00
Ines Montani	8f018e47f8	Adjust [initialize.components] on Language.remove_pipe and Language.rename_pipe	2020-10-04 14:43:45 +02:00
Ines Montani	ae15c9de79	Raise error from caught KeyError to preserve traceback	2020-10-03 11:43:56 +02:00
Stanislav Schmidt	3589a64d44	Change type of texts argument in pipe to iterable (#6186 ) * Change type of texts argument in pipe to iterable * Add contributor agreement	2020-10-02 21:00:11 +02:00
svlandeg	02247cccaf	Merge remote-tracking branch 'upstream/develop' into feature/small-fixes	2020-10-02 20:48:11 +02:00
svlandeg	6787e56315	print debugging warning before raising error if model not properly initialized	2020-10-01 09:21:00 +02:00
Ines Montani	fa47f87924	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
Ines Montani	798040bc1d	Fix language detection	2020-09-29 21:08:13 +02:00
Matthew Honnibal	8ce9f44433	Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare	2020-09-29 16:57:38 +02:00
Matthew Honnibal	ca72608059	Fix language	2020-09-29 16:48:33 +02:00
Ines Montani	fd594cfb9b	Tighten up format	2020-09-29 16:47:55 +02:00
Ines Montani	63d1598137	Simplify config use in Language.initialize	2020-09-29 16:05:48 +02:00
Ines Montani	adca08a12f	Pass nlp forward	2020-09-29 12:21:52 +02:00
Ines Montani	42f0e4c946	Clean up	2020-09-29 12:14:08 +02:00
Matthew Honnibal	9c8b2524fe	Upd initialize args	2020-09-29 12:08:37 +02:00
Matthew Honnibal	f2d1b7feb5	Clean up sgd	2020-09-29 12:00:08 +02:00
Ines Montani	78396d137f	Integrate initialize settings	2020-09-29 11:57:08 +02:00
Ines Montani	dec984a9c1	Update Language.initialize and support components/tokenizer settings	2020-09-29 11:52:45 +02:00
Matthew Honnibal	5276db6f3f	Remove 'device' argument from Language, clean up 'sgd' arg	2020-09-29 11:42:19 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
Ines Montani	658fad428a	Fix base schema integration	2020-09-27 22:50:36 +02:00
Ines Montani	7e938ed63e	Update config resolution to use new Thinc	2020-09-27 22:21:31 +02:00
Ines Montani	4bbe41f017	Fix combined scores and update test	2020-09-24 10:42:47 +02:00
Ines Montani	ae51f580c1	Fix handling of score_weights	2020-09-24 10:27:33 +02:00
Ines Montani	1114219ae3	Tidy up and auto-format	2020-09-21 10:59:07 +02:00
Adriane Boyd	47080fba98	Minor renaming / refactoring * Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message * Make `Vocab.lookups` a property	2020-09-18 19:43:19 +02:00
Adriane Boyd	eed4b785f5	Load vocab lookups tables at beginning of training Similar to how vectors are handled, move the vocab lookups to be loaded at the start of training rather than when the vocab is initialized, since the vocab doesn't have access to the full config when it's created. The option moves from `nlp.load_vocab_data` to `training.lookups`. Typically these tables will come from `spacy-lookups-data`, but any `Lookups` object can be provided. The loading from `spacy-lookups-data` is now strict, so configs for each language should specify the exact tables required. This also makes it easier to control whether the larger clusters and probs tables are included. To load `lexeme_norm` from `spacy-lookups-data`: ``` [training.lookups] @misc = "spacy.LoadLookupsData.v1" lang = ${nlp.lang} tables = ["lexeme_norm"] ```	2020-09-18 15:59:16 +02:00
Matthew Honnibal	c776594ab1	Fix	2020-09-16 18:15:14 +02:00

1 2 3 4 5 ...

660 Commits