spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-05 14:59:59 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	8fb59d958c	Format	2020-09-20 16:31:48 +02:00
Matthew Honnibal	dc22771f87	Fix sparse checkout	2020-09-20 16:30:05 +02:00
Matthew Honnibal	a0fb5e50db	Use simple git clone call if not sparse	2020-09-20 16:22:04 +02:00
Matthew Honnibal	2c24d633d0	Use updated run_command	2020-09-20 16:21:43 +02:00
Ines Montani	554c9a2497	Update docs [ci skip]	2020-09-20 12:30:53 +02:00
Ines Montani	e863b3dc14	Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2	2020-09-19 12:33:38 +02:00
Sofie Van Landeghem	39872de1f6	Introducing the gpu_allocator (#6091 ) * rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator' * --code instead of --code-path * update documentation * avoid querying the "system" section directly * add explanation of gpu_allocator to TF/PyTorch section in docs * fix typo * fix typo 2 * use set_gpu_allocator from thinc 8.0.0a34 * default null instead of empty string	2020-09-19 01:17:02 +02:00
Adriane Boyd	eed4b785f5	Load vocab lookups tables at beginning of training Similar to how vectors are handled, move the vocab lookups to be loaded at the start of training rather than when the vocab is initialized, since the vocab doesn't have access to the full config when it's created. The option moves from `nlp.load_vocab_data` to `training.lookups`. Typically these tables will come from `spacy-lookups-data`, but any `Lookups` object can be provided. The loading from `spacy-lookups-data` is now strict, so configs for each language should specify the exact tables required. This also makes it easier to control whether the larger clusters and probs tables are included. To load `lexeme_norm` from `spacy-lookups-data`: ``` [training.lookups] @misc = "spacy.LoadLookupsData.v1" lang = ${nlp.lang} tables = ["lexeme_norm"] ```	2020-09-18 15:59:16 +02:00
Ines Montani	a127fa475e	Merge pull request #6078 from svlandeg/fix/corpus	2020-09-18 14:44:21 +02:00
Ines Montani	3865214343	Use consistent shortcut	2020-09-17 16:57:02 +02:00
svlandeg	ddfc1fc146	add pretraining option to init config	2020-09-17 16:05:40 +02:00
svlandeg	427dbecdd6	cleanup and formatting	2020-09-17 11:48:04 +02:00
svlandeg	0c35885751	generalize corpora, dot notation for dev and train corpus	2020-09-17 11:38:59 +02:00
svlandeg	51fa929f47	rewrite train_corpus to corpus.train in config	2020-09-15 21:58:04 +02:00
Ines Montani	9cc304c194	Merge pull request #6064 from explosion/fix/sparse-checkout-ux Fix sparse checkout and error handling	2020-09-15 00:32:20 +02:00
Sofie Van Landeghem	3216a33149	positive_label config for textcat (#6062 ) * hook up positive_label in textcat * unit tests * documentation * formatting * tests * fix typo * move verify_config to after begin_training * revert accidential commit	2020-09-14 17:08:00 +02:00
Ines Montani	c052017025	Fix sparse checkout and error handling	2020-09-14 14:12:58 +02:00
Matthew Honnibal	54c40223a1	Improve v3 pretrain command (#6040 ) * Starts to run * Update pretrain script * Update corpus * Update pretrain schema * Remove outdated test * Make JsonlTexts produce Example objects.	2020-09-13 14:05:05 +02:00
Ines Montani	febb99916d	Tidy up and auto-format [ci skip]	2020-09-13 10:55:36 +02:00
Ines Montani	a5633b205f	Fix handling of errors around git [ci skip]	2020-09-13 10:52:28 +02:00
Ines Montani	f8846c198d	Update types and docstrings	2020-09-13 10:52:02 +02:00
Matthew Honnibal	37347830d4	Fix reading in GloVe vectors	2020-09-12 17:31:18 +02:00
Ines Montani	b41be87213	Merge pull request #6051 from svlandeg/feature/cli-config	2020-09-12 17:12:35 +02:00
Ines Montani	eedaaaec75	Fix handling of existing asset without checksum [ci skip]	2020-09-12 17:02:53 +02:00
svlandeg	a75cfe0da6	Merge remote-tracking branch 'upstream/develop' into feature/cli-config	2020-09-12 14:44:40 +02:00
svlandeg	115147804a	string_to_list to parse comma-separated string into a list	2020-09-12 14:43:22 +02:00
Ines Montani	f886f5bbc8	Merge pull request #6048 from explosion/fix/clone-compat	2020-09-12 10:30:49 +02:00
Ines Montani	0b2e07215d	Support overwriting name on spacy package	2020-09-11 11:38:28 +02:00
svlandeg	5b94aeece9	support pipeline as "list in string"	2020-09-11 11:08:46 +02:00
Ines Montani	1bce432b4a	Adjust message [ci skip]	2020-09-11 10:00:49 +02:00
Ines Montani	5acd4fbcd8	Merge branch 'develop' into fix/clone-compat	2020-09-11 09:58:30 +02:00
Ines Montani	761bd60d43	Adjust info message	2020-09-11 09:57:00 +02:00
Ines Montani	6831161bfa	Resolve path to be extra sure	2020-09-11 09:56:49 +02:00
svlandeg	1723fb73c4	remove brol	2020-09-10 17:44:59 +02:00
svlandeg	08a831ce83	process trailing slash if any	2020-09-10 17:39:52 +02:00
Ines Montani	3e83a509bb	WIP: fix project clone compatibility	2020-09-10 15:49:13 +02:00
svlandeg	f1bc09c1e9	restore partly	2020-09-10 14:53:02 +02:00
svlandeg	3889747119	asset fix & UX	2020-09-10 14:36:53 +02:00
svlandeg	a36766d153	hookup branch	2020-09-10 12:00:34 +02:00
svlandeg	97d99f7efa	Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes	2020-09-10 11:51:34 +02:00
Ines Montani	908f3a4494	Update default projects repo [ci skip]	2020-09-10 11:42:14 +02:00
svlandeg	92f9d2f406	small UX fixes	2020-09-10 11:35:50 +02:00
svlandeg	1fc5486792	more fine-grained errors for git_sparse_checkout	2020-09-10 11:31:32 +02:00
Ines Montani	15bc3a37b4	Add --branch to project clone	2020-09-10 11:08:15 +02:00
Sofie Van Landeghem	8e7557656f	Renaming gold & annotation_setter (#6042 ) * version bump to 3.0.0a16 * rename "gold" folder to "training" * rename 'annotation_setter' to 'set_extra_annotations' * formatting	2020-09-09 10:31:03 +02:00
Sofie Van Landeghem	60f22e1800	Pipe API (#6034 ) * ensure Language passes on valid examples for initialization * fix tagger model initialization * check for valid get_examples across components * assume labels were added before begin_training * fix senter initialization * fix morphologizer initialization * use methods to check arguments * test textcat init, requires thinc>=8.0.0a31 * fix tok2vec init * fix entity linker init * use islice * fix simple NER * cleanup debug model * fix assert statements * fix tests * throw error when adding a label if the output layer can't be resized anymore * fix test * add failing test for simple_ner * UX improvements * morphologizer UX * assume begin_training gets a representative set and processes the labels * remove assumptions for output of untrained NER model * restore test for original purpose	2020-09-08 22:44:25 +02:00
Matthew Honnibal	ba5f4c9b32	Add words and seconds to train info	2020-09-08 15:24:47 +02:00
Matthew Honnibal	b470062153	Add CLI registry (#6037 )	2020-09-08 15:23:34 +02:00
Matthew Honnibal	4b7abaafdb	Fix learn rate for non-transformer	2020-09-04 21:22:50 +02:00
Matthew Honnibal	465785a672	Fix project pull and push	2020-09-04 21:15:55 +02:00

1 2 3 4 5 ...

885 Commits