spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-17 01:22:37 +03:00

Author	SHA1	Message	Date
Ines Montani	744f259b9c	Update landing [ci skip]	2020-09-20 16:37:23 +02:00
Matthew Honnibal	8fb59d958c	Format	2020-09-20 16:31:48 +02:00
Matthew Honnibal	dc22771f87	Fix sparse checkout	2020-09-20 16:30:05 +02:00
Matthew Honnibal	a0fb5e50db	Use simple git clone call if not sparse	2020-09-20 16:22:04 +02:00
Matthew Honnibal	2c24d633d0	Use updated run_command	2020-09-20 16:21:43 +02:00
Matthew Honnibal	889128e5c5	Improve error handling in run_command	2020-09-20 16:20:57 +02:00
Ines Montani	554c9a2497	Update docs [ci skip]	2020-09-20 12:30:53 +02:00
svlandeg	6db1d5dc0d	trying some stuff	2020-09-19 19:11:30 +02:00
Ines Montani	e863b3dc14	Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2	2020-09-19 12:33:38 +02:00
Sofie Van Landeghem	39872de1f6	Introducing the gpu_allocator (#6091 ) * rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator' * --code instead of --code-path * update documentation * avoid querying the "system" section directly * add explanation of gpu_allocator to TF/PyTorch section in docs * fix typo * fix typo 2 * use set_gpu_allocator from thinc 8.0.0a34 * default null instead of empty string	2020-09-19 01:17:02 +02:00
Adriane Boyd	47080fba98	Minor renaming / refactoring * Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message * Make `Vocab.lookups` a property	2020-09-18 19:43:19 +02:00
svlandeg	73ff52b9ec	hack for tok2vec listener	2020-09-18 16:43:15 +02:00
Adriane Boyd	eed4b785f5	Load vocab lookups tables at beginning of training Similar to how vectors are handled, move the vocab lookups to be loaded at the start of training rather than when the vocab is initialized, since the vocab doesn't have access to the full config when it's created. The option moves from `nlp.load_vocab_data` to `training.lookups`. Typically these tables will come from `spacy-lookups-data`, but any `Lookups` object can be provided. The loading from `spacy-lookups-data` is now strict, so configs for each language should specify the exact tables required. This also makes it easier to control whether the larger clusters and probs tables are included. To load `lexeme_norm` from `spacy-lookups-data`: ``` [training.lookups] @misc = "spacy.LoadLookupsData.v1" lang = ${nlp.lang} tables = ["lexeme_norm"] ```	2020-09-18 15:59:16 +02:00
Ines Montani	0406200a1e	Update docs [ci skip]	2020-09-18 15:13:13 +02:00
Ines Montani	a127fa475e	Merge pull request #6078 from svlandeg/fix/corpus	2020-09-18 14:44:21 +02:00
Matthew Honnibal	bbdb5f62b7	Temporary work-around for scoring a subset of components (#6090 ) * Try hacking the scorer to work around sentence boundaries * Upd scorer * Set dev version * Upd scorer hack * Fix version * Improve comment on hack	2020-09-18 14:26:42 +02:00
Ines Montani	d32ce121be	Fix docs [ci skip]	2020-09-18 13:41:12 +02:00
Adriane Boyd	a88106e852	Remove W106: HEAD and SENT_START in doc.from_array (#6086 ) * Remove W106: HEAD and SENT_START in doc.from_array This warning was hacky and being triggered too often. * Fix test	2020-09-18 03:01:29 +02:00
svlandeg	e4fc7e0222	fixing output sample to proper 2D array	2020-09-17 22:34:36 +02:00
Adriane Boyd	8b650f3a78	Modify setting missing and blocked entity tokens In order to make it easier to construct `Doc` objects as training data, modify how missing and blocked entity tokens are set to prioritize setting `O` and missing entity tokens for training purposes over setting blocked entity tokens. * `Doc.ents` setter sets tokens outside entity spans to `O` regardless of the current state of each token * For `Doc.ents`, setting a span with a missing label sets the `ent_iob` to missing instead of blocked * `Doc.block_ents(spans)` marks spans as hard `O` for use with the `EntityRecognizer`	2020-09-17 21:27:42 +02:00
Ines Montani	9062585a13	Merge pull request #6087 from explosion/docs/pretrain-usage [ci skip]	2020-09-17 19:25:24 +02:00
Ines Montani	a0b4389a38	Update docs [ci skip]	2020-09-17 19:24:48 +02:00
Matthew Honnibal	6efb7688a6	Draft pretrain usage	2020-09-17 18:17:03 +02:00
Sofie Van Landeghem	ed0fb034cb	ml_datasets v0.2.0a0	2020-09-17 18:11:10 +02:00
Ines Montani	1bb8b4f824	Merge branch 'master' into develop	2020-09-17 17:46:20 +02:00
Ines Montani	6bd0d25fb9	Merge pull request #6085 from explosion/docs/static-vectors-intro [ci skip]	2020-09-17 17:14:45 +02:00
Ines Montani	a2c8cda26f	Update docs [ci skip]	2020-09-17 17:12:51 +02:00
Ines Montani	2c80f41852	Merge pull request #6084 from svlandeg/feature/init-config-pretrain [ci skip]	2020-09-17 16:59:14 +02:00
Ines Montani	2e3ce9f42f	Merge branch 'feature/init-config-pretrain' of https://github.com/svlandeg/spaCy into pr/6084	2020-09-17 16:58:49 +02:00
Ines Montani	3d8e010655	Change order	2020-09-17 16:58:46 +02:00
Ines Montani	c4b414b282	Update website/docs/api/cli.md	2020-09-17 16:58:09 +02:00
Ines Montani	3865214343	Use consistent shortcut	2020-09-17 16:57:02 +02:00
Sofie Van Landeghem	e5ceec5df0	Update website/docs/api/cli.md Co-authored-by: Ines Montani <ines@ines.io>	2020-09-17 16:56:20 +02:00
Sofie Van Landeghem	127ce0c574	Update website/docs/api/cli.md Co-authored-by: Ines Montani <ines@ines.io>	2020-09-17 16:55:53 +02:00
Matthew Honnibal	ec751068f3	Draft text for static vectors intro	2020-09-17 16:42:53 +02:00
svlandeg	35a3931064	fix typo	2020-09-17 16:36:27 +02:00
svlandeg	5fade4feb7	fix cli abbrev	2020-09-17 16:15:20 +02:00
svlandeg	ddfc1fc146	add pretraining option to init config	2020-09-17 16:05:40 +02:00
svlandeg	3a3110ef60	remove empty files	2020-09-17 15:44:11 +02:00
svlandeg	c8c84f1ccd	Merge remote-tracking branch 'upstream/develop' into fix/corpus	2020-09-17 15:43:04 +02:00
svlandeg	130ffa5fbf	fix typos in docs	2020-09-17 14:59:41 +02:00
Matthew Honnibal	b57ce9a875	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-17 13:59:25 +02:00
Matthew Honnibal	30e85b2a42	Remove outdated configs	2020-09-17 13:59:12 +02:00
Ines Montani	c8fa2247e3	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-17 12:34:15 +02:00
Ines Montani	6761028c6f	Update docs [ci skip]	2020-09-17 12:34:11 +02:00
svlandeg	427dbecdd6	cleanup and formatting	2020-09-17 11:48:04 +02:00
svlandeg	0c35885751	generalize corpora, dot notation for dev and train corpus	2020-09-17 11:38:59 +02:00
svlandeg	8cedb2f380	Merge branch 'fix/corpus' of https://github.com/svlandeg/spaCy into fix/corpus	2020-09-17 09:27:55 +02:00
svlandeg	781fae678b	Merge remote-tracking branch 'upstream/develop' into fix/corpus	2020-09-17 09:24:36 +02:00
Sofie Van Landeghem	21dcf92964	Update website/docs/api/data-formats.md Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-17 09:21:36 +02:00

... 16 17 18 19 20 ...

13976 Commits