spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 04:08:09 +03:00

Author	SHA1	Message	Date
Adriane Boyd	0ee2ae86bf	Update trf quickstart recommendations Add/update trf recommendations for Bengali, Hindi, Sinhala, and Tamil based on #7044.	2021-02-12 15:55:17 +01:00
Ines Montani	26bf642afd	Fix issue #7019 : Handle None scores in evaluate printer (#7026 )	2021-02-11 16:45:23 +11:00
Ines Montani	c08b3f294c	Support env vars and CLI overrides for project.yml	2021-02-10 13:45:27 +11:00
svlandeg	f852af2acf	add capture arg	2021-02-02 19:47:12 +01:00
Sofie Van Landeghem	f319d2765f	Add capture argument to project_run (#6878 ) * add capture argument to project_run and run_commands * git bump to 3.0.1 * Set version to 3.0.1.dev0 Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-02-02 10:11:15 +08:00
Ines Montani	a59f3fcf5d	Make wheel the default format and update docs [ci skip]	2021-02-01 23:18:43 +11:00
Ines Montani	b9573e9e22	Fix pip args	2021-02-01 23:15:00 +11:00
Ines Montani	b46073234a	Fix default clone branch and error handling [ci skip]	2021-02-01 22:29:04 +11:00
Adriane Boyd	35a863cd27	Remove nlp.tokenizer from quickstart template Remove `nlp.tokenizer` from quickstart template so that the default language-specific tokenizer settings are filled instead.	2021-02-01 11:20:12 +01:00
Ines Montani	f058cbd751	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2021-01-30 21:03:25 +11:00
Ines Montani	3435b894df	Remove nightly reference from auto docs [ci skip]	2021-01-30 20:12:08 +11:00
Ines Montani	d0c3775712	Replace links to nightly docs [ci skip]	2021-01-30 20:09:38 +11:00
Ines Montani	b26a3daa9a	Merge pull request #6860 from explosion/feature/package-wheel	2021-01-30 14:17:01 +11:00
Ines Montani	2332c4280b	Update and use unified --build option	2021-01-30 13:11:36 +11:00
Ines Montani	e6accb3a9e	Tidy up and auto-format	2021-01-30 12:52:33 +11:00
Ines Montani	30765674d0	Merge branch 'master' into develop	2021-01-30 12:20:28 +11:00
Ines Montani	2609ba4e89	Support building wheel in spacy package	2021-01-30 11:54:02 +11:00
Pamphile ROY	41ee75ac6d	Remove --no-cache-dir when downloading models When `--no-cache-dir` is present, it prevents caching to properly function. If the user still wants to do this, there is the possibility to pass options with `user_pip_args`. But you should not enforce options like these. In my case this is preventing some docker build (using buildkit caching) to have proper caching of models.	2021-01-29 15:37:44 +01:00
Ines Montani	78d6ff4dd4	Update quickstart recommendations	2021-01-28 11:14:49 +11:00
Ines Montani	ec5f55aa5b	Update config generation defaults and transformers (#6832 )	2021-01-27 23:56:33 +11:00
Ines Montani	c0926c9088	WIP: Various small training changes (#6818 ) * Allow output_path to be None during training * Fix cat scoring (?) * Improve error message for weighted None score * Improve messages So we can call this in other places etc. * FIx output path check * Use latest wasabi * Revert "Improve error message for weighted None score" This reverts commit `7059926763`. * Exclude None scores from final score by default It's otherwise very difficult to keep track of the score weights if we modify a config programmatically, source components etc. * Update warnings and use logger.warning	2021-01-26 14:51:52 +11:00
Adriane Boyd	0f2de39efb	Fix types for exclude args in info CLI (#6808 )	2021-01-25 20:00:22 +08:00
KeshavG-lb	0a86d833d7	Spacy Cli info method causing backward compatibility issues (#6793 ) * Spacy Cli info method causing backward compatibility issues #6791 fix backward compatibility by setting default value to exclude in info method. * setting empty list as default argument is dangerous. so setting default to None and then setting it to emptylist, if None. Reference : https://nikos7am.com/posts/mutable-default-arguments/	2021-01-23 11:21:43 +01:00
Ines Montani	b0b743597c	Tidy up and auto-format	2021-01-15 11:57:36 +11:00
Ines Montani	e8a97a2bd6	Merge pull request #6720 from adrianeboyd/feature/improved-init-training-config-validation	2021-01-15 11:45:24 +11:00
Adriane Boyd	5fb8b7037a	Expand initialize/training config validation Validate both `[initialize]` and `[training]` in `debug data` and `nlp.initialize()` with separate config validation error blocks that indicate which block of the config is being validated.	2021-01-12 17:17:00 +01:00
svlandeg	1abeca90a6	refer to _parser_internals.nonproj.DELIMITER	2021-01-07 18:58:13 +01:00
Sofie Van Landeghem	75d9019343	Fix types of Tok2Vec encoding architectures (#6442 ) * fix TorchBiLSTMEncoder documentation * ensure the types of the encoding Tok2vec layers are correct * update references from v1 to v2 for the new architectures	2021-01-07 16:39:27 +11:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Ines Montani	6f83abb971	Merge pull request #6647 from svlandeg/feature/init_config_overwrite	2021-01-05 14:59:04 +11:00
Ines Montani	81f018fb67	Merge pull request #6671 from explosion/chore/tidy-autoformat Tidy up and auto-format	2021-01-05 14:45:31 +11:00
Ines Montani	a9e845426f	Use --force for consistency and add docs	2021-01-05 13:49:59 +11:00
Ines Montani	991669c934	Tidy up and auto-format	2021-01-05 13:41:53 +11:00
svlandeg	712a78b74a	add simple unit test	2020-12-30 12:35:26 +01:00
svlandeg	4347e6d39b	fixes for CLI info command	2020-12-30 12:05:58 +01:00
svlandeg	62b4fe118f	prevent overwriting existing config file	2020-12-29 15:40:22 +01:00
Tim Gates	292c1d6a73	docs: fix simple typo, speficied -> specified (#6611 ) There is a small typo in spacy/cli/info.py. Should read `specified` rather than `speficied`.	2020-12-22 09:14:10 +01:00
Sofie Van Landeghem	282a3b49ea	Fix parser resizing when there is no upper layer (#6460 ) * allow resizing of the parser model even when upper=False * update from spacy.TransitionBasedParser.v1 to v2 * bugfix	2020-12-18 18:56:57 +08:00
Ines Montani	3f90bffa27	Merge pull request #6571 from adrianeboyd/bugfix/debug-data-missing-vectors Fix alignment and vector checks in debug data	2020-12-17 10:10:47 +11:00
Adriane Boyd	1ddf2f39c7	Switch converters to generator functions (#6547 ) * Switch converters to generator functions To reduce the memory usage when converting large corpora, refactor the convert methods to be generator functions. * Update tests	2020-12-15 16:47:16 +08:00
Adriane Boyd	20e18cc246	Fix alignment and vector checks in debug data * Update token alignment check to use Example alignment * Update missing vector check further related to changes in v3	2020-12-15 09:43:14 +01:00
Ines Montani	513c4e332a	Include custom code via spacy package command (#6531 )	2020-12-10 20:36:46 +08:00
Ines Montani	2a6043fabb	Merge pull request #6530 from explosion/feature/init-config-cpu-gpu	2020-12-10 09:38:46 +11:00
Ines Montani	9d32e839d3	Merge branch 'develop' into feature/init-config-cpu-gpu	2020-12-10 08:50:53 +11:00
Adriane Boyd	fa8fa474a3	Add nlp.batch_size setting Add a default `batch_size` setting for `Language.pipe` and `Language.evaluate` as `nlp.batch_size`.	2020-12-09 09:13:26 +01:00
Ines Montani	758ad6c3cd	Make CPU the default for init config	2020-12-09 11:00:51 +11:00
Ines Montani	5d605d539d	Remove output_file from init_config helper	2020-12-09 10:57:55 +11:00
svlandeg	8f8a7f1733	returning config in init_config	2020-12-08 17:37:20 +01:00
Ines Montani	6c7a930ee8	Fix variable	2020-12-08 20:44:59 +11:00
Ines Montani	94a5a9814f	Update argument handling and documentation	2020-12-08 20:41:18 +11:00
Adriane Boyd	5ceac425ee	Remove non-working --use-chars from train CLI Remove the non-working `--use-chars` option from the train CLI. The implementation of the option across component types and the CLI settings could be fixed, but the `CharacterEmbed` model does not work on GPU in v2 so it's better to remove it.	2020-12-08 08:30:00 +01:00
Ines Montani	d25b1606d6	Allow reading config from sdtin in spacy train	2020-12-08 18:01:40 +11:00
Adriane Boyd	78085fab1f	Check for spacy-nightly package in download (#6502 ) Also check for spacy-nightly in download so that `--no-deps` isn't set for normal nightly installs.	2020-12-04 09:40:03 +01:00
Adriane Boyd	591cd48aa8	Remove config.cfg from MANIFEST	2020-12-01 12:58:02 +01:00
Adriane Boyd	b0dd13e0ba	Support LICENSE in spacy package If present, include the file `input_dir/LICENSE` at the top level of the packaged model.	2020-11-30 13:43:58 +01:00
Ines Montani	9beba7164f	Make jinja2 top-level import No problem anymore since it's now an official dependency	2020-11-27 15:17:14 +08:00
Adriane Boyd	573f5c863f	Fix tag map clobbering in spacy train (#6437 ) Fix bug from #5768 where the tag map is clobbered if a custom tag map isn't provided.	2020-11-24 13:13:16 +01:00
Sofie Van Landeghem	a0c899a0ff	Fix textcat + transformer architecture (#6371 ) * add pooling to textcat TransformerListener * maybe_get_dim in case it's null	2020-11-10 20:14:47 +08:00
Adriane Boyd	a4b32b9552	Handle missing reference values in scorer (#6286 ) * Handle missing reference values in scorer Handle missing values in reference doc during scoring where it is possible to detect an unset state for the attribute. If no reference docs contain annotation, `None` is returned instead of a score. `spacy evaluate` displays `-` for missing scores and the missing scores are saved as `None`/`null` in the metrics. Attributes without unset states: * `token.head`: relies on `token.dep` to recognize unset values * `doc.cats`: unable to handle missing annotation Additional changes: * add optional `has_annotation` check to `score_scans` to replace `doc.sents` hack * update `score_token_attr_per_feat` to handle missing and empty morph representations * fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START` vs. `SENT_START` * Fix import * Update return types	2020-11-03 15:47:18 +01:00
Ines Montani	2c9804038d	Fix success message [ci skip]	2020-10-23 16:11:54 +02:00
Adriane Boyd	563a21834e	Save raw scores in evaluate output	2020-10-19 15:49:09 +02:00
Adriane Boyd	dd207ca6d0	Add dep_las_per_type and more generic PRF printer	2020-10-19 15:49:02 +02:00
Adriane Boyd	4300858ecb	Include per-type/feat scores in evaluate output	2020-10-19 15:48:55 +02:00
Sofie Van Landeghem	75a202ce65	TextCat updates and fixes (#6263 ) * small fix in example imports * throw error when train_corpus or dev_corpus is not a string * small fix in custom logger example * limit macro_auc to labels with 2 annotations * fix typo * also create parents of output_dir if need be * update documentation of textcat scores * refactor TextCatEnsemble * fix tests for new AUC definition * bump to 3.0.0a42 * update docs * rename to spacy.TextCatEnsemble.v2 * spacy.TextCatEnsemble.v1 in legacy * cleanup * small fix * update to 3.0.0rc2 * fix import that got lost in merge * cursed IDE * fix two typos	2020-10-18 14:50:41 +02:00
Adriane Boyd	c8d04b79e2	Sort and add vectors for langs without transformers	2020-10-16 08:25:16 +02:00
Adriane Boyd	2fbd43c603	Use core lg models as vectors models in quickstart	2020-10-16 08:17:53 +02:00
Ines Montani	1f49300862	Update transformer recommendations [ci skip]	2020-10-13 15:41:17 +02:00
svlandeg	e972ecba72	add utf8 encoding for opening file	2020-10-09 16:03:14 +02:00
Sofie Van Landeghem	241cd112f5	add reenabled pipe names back to the meta before serializing (#6219 )	2020-10-08 00:44:16 +02:00
svlandeg	9b4cf7b0b6	update output of debug config command	2020-10-06 09:47:23 +02:00
Ines Montani	181039bd17	Merge pull request #6205 from explosion/feature/embed-features	2020-10-05 21:49:10 +02:00
Matthew Honnibal	b7e01d2024	Fix quickstart	2020-10-05 21:21:30 +02:00
Matthew Honnibal	ff8b980775	Upd quickstart template	2020-10-05 21:19:41 +02:00
Ines Montani	0135f6ed95	Enable commit check via env var	2020-10-05 20:51:15 +02:00
Ines Montani	d58fb42707	Add spacy_version option and validation for project.yml	2020-10-05 20:00:42 +02:00
Ines Montani	84fedcebab	Make args keyword-only [ci skip] Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-10-05 17:07:35 +02:00
Ines Montani	6958510bda	Include spaCy version check in project CLI	2020-10-05 13:53:07 +02:00
Ines Montani	bcd52e5486	Tidy up errors and warnings	2020-10-04 11:16:31 +02:00
Ines Montani	3bc3c05fcc	Tidy up and auto-format	2020-10-03 17:20:18 +02:00
Matthew Honnibal	db419f6b2f	Improve control of training progress and logging (#6184 ) * Make logging and progress easier to control * Update docs * Cleanup errors * Fix ConfigValidationError * Pass stdout/stderr, not wasabi.Printer * Fix type * Upd logging example * Fix logger example * Fix type	2020-10-03 14:57:46 +02:00
Adriane Boyd	22158dc24a	Add morphologizer to quickstart template	2020-10-02 15:06:16 +02:00
Ines Montani	f2627157c8	Update docs [ci skip]	2020-10-01 17:38:17 +02:00
Ines Montani	7f68f4bd92	Hide jsonl_loc on init vectors and tidy up [ci skip]	2020-10-01 16:44:17 +02:00
Ines Montani	0a8a124a6e	Update docs [ci skip]	2020-10-01 12:15:53 +02:00
Ines Montani	44160cd52f	Tidy up [ci skip]	2020-10-01 10:41:19 +02:00
Matthew Honnibal	59294e91aa	Restore the 'jsonl' arg for init vectors The lexemes.jsonl file is still used in our English vectors, and it may be required by users as well. I think it's worth supporting the option.	2020-09-30 19:06:50 +02:00
Ines Montani	23c63eefaf	Tidy up env vars [ci skip]	2020-09-30 15:15:11 +02:00
Elijah Rippeth	4cbb954281	reorder so tagmap is replaced only if a custom file is provided. (#6164 ) * reorder so tagmap is replaced only if a custom file is provided. * Remove unneeded variable initialization Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-09-30 13:26:06 +02:00
Ines Montani	a5debb356d	Tidy up and adjust logging [ci skip]	2020-09-30 01:22:08 +02:00
Ines Montani	56a2f778c4	Add logging [ci skip]	2020-09-30 01:08:55 +02:00
Ines Montani	fe3f111c37	Merge pull request #6168 from explosion/fix/default-corpus-values	2020-09-30 00:24:02 +02:00
Ines Montani	ae51843468	Remove augmenter from jinja template [ci skip]	2020-09-29 23:08:50 +02:00
Ines Montani	9bb958fd0a	Fix debug data [ci skip]	2020-09-29 23:07:11 +02:00
Ines Montani	df8dd91b6f	Merge branch 'develop' into fix/default-corpus-values	2020-09-29 22:55:39 +02:00
Ines Montani	0a1ee109db	Remove init form path	2020-09-29 22:53:18 +02:00
Ines Montani	c334a7d45f	Remove	2020-09-29 22:38:39 +02:00
Ines Montani	1aeef3bfbb	Make corpus paths default to None and improve errors	2020-09-29 22:33:46 +02:00
Ines Montani	0250bcf6a3	Show validation error during init	2020-09-29 22:29:09 +02:00
Ines Montani	43c92ec8c9	Resolve dir for better output [ci skip]	2020-09-29 22:01:04 +02:00
Ines Montani	fa47f87924	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
Ines Montani	604be54a5c	Support --code in evaluate CLI [ci skip]	2020-09-29 21:20:56 +02:00
Ines Montani	d3c63b7965	Merge branch 'develop' into feature/prepare	2020-09-29 20:53:05 +02:00
Ines Montani	2be80379ec	Fix small issues, resolve_dot_names and debug model	2020-09-29 20:38:35 +02:00
Ines Montani	71a0ee274a	Move init labels to init pipeline module	2020-09-29 18:09:33 +02:00
Ines Montani	534e1ef498	Fix template	2020-09-29 17:02:55 +02:00
Matthew Honnibal	10847c7f4e	Fix arg	2020-09-29 16:48:07 +02:00
Matthew Honnibal	e70a00fa76	Remove unnecessary warning from train	2020-09-29 16:47:54 +02:00
Matthew Honnibal	3f0d61232d	Remove outdated arg from train	2020-09-29 16:47:44 +02:00
Matthew Honnibal	e957d66b92	Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare	2020-09-29 16:22:53 +02:00
Matthew Honnibal	45daf5c9fe	Add init labels command	2020-09-29 16:22:37 +02:00
Ines Montani	aa2a6882d0	Fix logging	2020-09-29 16:08:39 +02:00
Sofie Van Landeghem	6a04e5adea	encoding UTF8 (#6161 )	2020-09-29 14:49:55 +02:00
Ines Montani	4925ad760a	Add init vectors	2020-09-29 10:58:50 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
Ines Montani	a139fe672b	Fix typos and refactor CLI logging	2020-09-28 21:17:10 +02:00
Ines Montani	2e9c9e74af	Fix config resolution and interpolation TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)	2020-09-28 15:34:00 +02:00
Ines Montani	822ea4ef61	Refactor CLI	2020-09-28 15:09:59 +02:00
Ines Montani	a89e0ff7cb	Fix typo	2020-09-28 12:55:21 +02:00
Ines Montani	a62337b3f3	Tidy up vocab init	2020-09-28 12:53:06 +02:00
Ines Montani	c22ecc66bb	Don't support init path for now	2020-09-28 12:46:28 +02:00
Ines Montani	a5f2cc0509	Tidy up and remove raw text (rehearsal) for now	2020-09-28 12:30:13 +02:00
Ines Montani	1590de11b1	Update config	2020-09-28 12:05:23 +02:00
Ines Montani	e44a7519cd	Update CLI and add [initialize] block	2020-09-28 11:56:14 +02:00
Ines Montani	d5155376fd	Update vocab init	2020-09-28 11:30:18 +02:00
Ines Montani	8b74fd19df	init pipeline -> init nlp	2020-09-28 11:13:38 +02:00
Ines Montani	2fdb7285a0	Update CLI	2020-09-28 11:06:07 +02:00
Ines Montani	553bfea641	Fix commands	2020-09-28 10:53:17 +02:00
Matthew Honnibal	44bad1474c	Add init_pipeline file	2020-09-28 09:47:34 +02:00
Matthew Honnibal	b886f53c31	init-pipeline runs (maybe doesnt work)	2020-09-28 03:42:47 +02:00
Matthew Honnibal	ed2aff2db3	Remove unused train code	2020-09-28 03:12:31 +02:00
Matthew Honnibal	3a0a3b8db6	Dont hard-code for 'corpora' name	2020-09-28 03:06:33 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Matthew Honnibal	a3e1791c9c	Upd train	2020-09-28 01:08:30 +02:00
Matthew Honnibal	b5556093e2	Start updating train script	2020-09-27 23:59:44 +02:00
Ines Montani	e04bd16f7f	Merge branch 'develop' into feature/new-thinc-config-resolution	2020-09-27 22:34:46 +02:00
Ines Montani	d7ad65a9bb	Fix handling of error description [ci skip]	2020-09-27 22:31:57 +02:00
Ines Montani	7e938ed63e	Update config resolution to use new Thinc	2020-09-27 22:21:31 +02:00
Matthew Honnibal	39b178999c	Tmp notes	2020-09-27 20:13:38 +02:00
Ines Montani	b4486d747d	Merge branch 'develop' into fix/train-config-interpolation	2020-09-26 15:32:14 +02:00
Ines Montani	b2d07de786	Construct nlp from uninterpolated config before training	2020-09-26 15:16:59 +02:00
Ines Montani	ca3c997062	Improve CLI config validation with latest Thinc	2020-09-26 13:13:57 +02:00
Matthew Honnibal	3d8388969e	Sort paths for cache consistency	2020-09-25 19:07:26 +02:00
Sofie Van Landeghem	009ba14aaf	Fix pretraining in train script (#6143 ) * update pretraining API in train CLI * bump thinc to 8.0.0a35 * bump to 3.0.0a26 * doc fixes * small doc fix	2020-09-25 15:47:10 +02:00
Matthew Honnibal	74ee456374	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-24 16:11:47 +02:00
Matthew Honnibal	0bc214c102	Fix pull	2020-09-24 16:11:33 +02:00
Ines Montani	74e1f192b4	Merge pull request #6134 from explosion/feature/training_before_to_disk	2020-09-24 14:44:11 +02:00
Ines Montani	24e7ac3f2b	Fix download CLI [ci skip]	2020-09-24 14:43:56 +02:00
Ines Montani	88e54caa12	accuracy -> performance	2020-09-24 14:32:35 +02:00
Ines Montani	be56c0994b	Add [training.before_to_disk] callback	2020-09-24 12:40:25 +02:00
Ines Montani	c6c67b606e	Merge pull request #6133 from explosion/fix/score_weights	2020-09-24 12:00:57 +02:00
Ines Montani	f69fea8b25	Improve error handling around non-number scores	2020-09-24 11:29:07 +02:00
Matthew Honnibal	17a6b0a173	Make project pull order insensitive (#6131 )	2020-09-24 10:30:42 +02:00
Ines Montani	ae51f580c1	Fix handling of score_weights	2020-09-24 10:27:33 +02:00
svlandeg	35dbc63578	Merge remote-tracking branch 'upstream/develop' into fix/nr_features # Conflicts: # spacy/ml/models/parser.py # spacy/tests/serialize/test_serialize_config.py # website/docs/api/architectures.md	2020-09-23 17:01:13 +02:00
svlandeg	dd2292793f	'parser' instead of 'deps' for state_type	2020-09-23 16:53:49 +02:00
svlandeg	6c85fab316	state_type and extra_state_tokens instead of nr_feature_tokens	2020-09-23 13:35:09 +02:00
Ines Montani	7745d77a38	Fix whitespace in template [ci skip]	2020-09-23 13:21:42 +02:00
svlandeg	6435458d51	simplify expression	2020-09-23 12:12:38 +02:00
svlandeg	20b0ec5dcf	avoid logging performance of frozen components	2020-09-23 10:37:12 +02:00
Ines Montani	6ca06cb62c	Update docs and formatting [ci skip]	2020-09-23 10:14:27 +02:00
Ines Montani	888f936a73	Merge pull request #6106 from svlandeg/feature/textcat-quickstart	2020-09-23 10:11:45 +02:00
Ines Montani	60a317520a	Merge pull request #6109 from svlandeg/feature/2rename	2020-09-23 09:47:12 +02:00
svlandeg	556f3e4652	add pooling to NEL's TransformerListener	2020-09-23 09:24:28 +02:00
Sofie Van Landeghem	86a08f819d	tok2vec.update instead of predict (#6113 )	2020-09-22 21:54:52 +02:00
Ines Montani	5e3b796b12	Validate section refs in debug config	2020-09-22 12:24:39 +02:00
svlandeg	085a1c8e2b	add no_output_layer to TextCatBOW config	2020-09-22 12:06:40 +02:00
svlandeg	b556a10808	rename converts in_to_out	2020-09-22 11:50:19 +02:00
svlandeg	e931f4d757	add textcat score	2020-09-22 10:56:43 +02:00
svlandeg	396b33257f	add entity_linker to jinja template	2020-09-22 10:40:05 +02:00
svlandeg	135de82a2d	add textcat to quickstart	2020-09-22 10:22:06 +02:00
Ines Montani	6316d5f398	Improve messages in project CLI [ci skip]	2020-09-22 09:45:34 +02:00
Ines Montani	81606b29bd	Merge pull request #6104 from svlandeg/fix/debug_model [ci skip]	2020-09-22 09:31:23 +02:00
svlandeg	45b29c4a5b	cleanup	2020-09-21 23:17:23 +02:00
svlandeg	fa5c416db6	initialize through nlp object and with train_corpus	2020-09-21 23:09:22 +02:00
svlandeg	447b3e5787	Merge remote-tracking branch 'upstream/develop' into fix/debug_model # Conflicts: # spacy/cli/debug_model.py	2020-09-21 16:58:40 +02:00
Ines Montani	e8bcaa44f1	Don't auto-decompress archives with smart_open [ci skip]	2020-09-21 16:01:46 +02:00
svlandeg	eb9b447960	Merge remote-tracking branch 'upstream/develop' into fix/debug_model # Conflicts: # spacy/cli/debug_model.py	2020-09-21 14:05:16 +02:00
Ines Montani	758ead8a47	Sync overrides with CLI overrides	2020-09-21 12:50:13 +02:00
Ines Montani	5497acf49a	Support config overrides via environment variables	2020-09-21 11:25:10 +02:00
Ines Montani	1114219ae3	Tidy up and auto-format	2020-09-21 10:59:07 +02:00
Ines Montani	b2302c0a1c	Improve error for missing dependency	2020-09-20 17:44:51 +02:00
Matthew Honnibal	8fb59d958c	Format	2020-09-20 16:31:48 +02:00
Matthew Honnibal	dc22771f87	Fix sparse checkout	2020-09-20 16:30:05 +02:00
Matthew Honnibal	a0fb5e50db	Use simple git clone call if not sparse	2020-09-20 16:22:04 +02:00
Matthew Honnibal	2c24d633d0	Use updated run_command	2020-09-20 16:21:43 +02:00
Ines Montani	554c9a2497	Update docs [ci skip]	2020-09-20 12:30:53 +02:00
svlandeg	6db1d5dc0d	trying some stuff	2020-09-19 19:11:30 +02:00
Ines Montani	e863b3dc14	Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2	2020-09-19 12:33:38 +02:00
Sofie Van Landeghem	39872de1f6	Introducing the gpu_allocator (#6091 ) * rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator' * --code instead of --code-path * update documentation * avoid querying the "system" section directly * add explanation of gpu_allocator to TF/PyTorch section in docs * fix typo * fix typo 2 * use set_gpu_allocator from thinc 8.0.0a34 * default null instead of empty string	2020-09-19 01:17:02 +02:00
svlandeg	73ff52b9ec	hack for tok2vec listener	2020-09-18 16:43:15 +02:00
Adriane Boyd	eed4b785f5	Load vocab lookups tables at beginning of training Similar to how vectors are handled, move the vocab lookups to be loaded at the start of training rather than when the vocab is initialized, since the vocab doesn't have access to the full config when it's created. The option moves from `nlp.load_vocab_data` to `training.lookups`. Typically these tables will come from `spacy-lookups-data`, but any `Lookups` object can be provided. The loading from `spacy-lookups-data` is now strict, so configs for each language should specify the exact tables required. This also makes it easier to control whether the larger clusters and probs tables are included. To load `lexeme_norm` from `spacy-lookups-data`: ``` [training.lookups] @misc = "spacy.LoadLookupsData.v1" lang = ${nlp.lang} tables = ["lexeme_norm"] ```	2020-09-18 15:59:16 +02:00
Ines Montani	a127fa475e	Merge pull request #6078 from svlandeg/fix/corpus	2020-09-18 14:44:21 +02:00
svlandeg	e4fc7e0222	fixing output sample to proper 2D array	2020-09-17 22:34:36 +02:00
Ines Montani	3865214343	Use consistent shortcut	2020-09-17 16:57:02 +02:00
svlandeg	35a3931064	fix typo	2020-09-17 16:36:27 +02:00
svlandeg	ddfc1fc146	add pretraining option to init config	2020-09-17 16:05:40 +02:00
svlandeg	427dbecdd6	cleanup and formatting	2020-09-17 11:48:04 +02:00
svlandeg	0c35885751	generalize corpora, dot notation for dev and train corpus	2020-09-17 11:38:59 +02:00
svlandeg	51fa929f47	rewrite train_corpus to corpus.train in config	2020-09-15 21:58:04 +02:00
Ines Montani	9cc304c194	Merge pull request #6064 from explosion/fix/sparse-checkout-ux Fix sparse checkout and error handling	2020-09-15 00:32:20 +02:00

... 2 3 4 5 6 ...

1227 Commits