spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-07-09 15:52:31 +03:00

Author	SHA1	Message	Date
Ines Montani	1980203229	Merge branch 'master' into pr/6444	2020-12-09 11:09:40 +11:00
Koichi Yasuoka	0afb54ac93	JapaneseTokenizer.pipe added (#6515 ) * JapaneseTokenizer.pipe added For [spacymoji](https://spacy.io/universe/project/spacymoji) with `Japanese()`. * DummyTokenizer.pipe added instead	2020-12-08 20:02:23 +01:00
Ines Montani	d25b1606d6	Allow reading config from sdtin in spacy train	2020-12-08 18:01:40 +11:00
svlandeg	1f465bea18	if-else	2020-10-13 09:27:19 +02:00
Ines Montani	bfa3931c9d	Revert added_strings change (#6236 )	2020-10-10 18:55:07 +02:00
svlandeg	040c7c0541	fix get_dim calls in build_simple_cnn_text_classifier	2020-10-09 15:40:58 +02:00
Florijan Stamenković	18f5c309dc	Fix Issue 6207 (#6208 ) * Regression test for issue 6207 * Fix issue 6207 * Sign contributor agreement * Minor adjustments to test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-10-09 10:14:40 +02:00
Sofie Van Landeghem	d093d6343b	TrainablePipe (#6213 ) * rename Pipe to TrainablePipe * split functionality between Pipe and TrainablePipe * remove unnecessary methods from certain components * cleanup * hasattr(component, "pipe") should be sufficient again * remove serialization and vocab/cfg from Pipe * unify _ensure_examples and validate_examples * small fixes * hasattr checks for self.cfg and self.vocab * make is_resizable and is_trainable properties * serialize strings.json instead of vocab * fix KB IO + tests * fix typos * more typos * _added_strings as a set * few more tests specifically for _added_strings field * bump to 3.0.0a36	2020-10-08 21:33:49 +02:00
Florijan Stamenković	9db670b996	Fix Issue 6207 (#6208 ) * Regression test for issue 6207 * Fix issue 6207 * Sign contributor agreement * Minor adjustments to test Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2020-10-06 11:17:37 +02:00
Ines Montani	0135f6ed95	Enable commit check via env var	2020-10-05 20:51:15 +02:00
Ines Montani	6958510bda	Include spaCy version check in project CLI	2020-10-05 13:53:07 +02:00
Ines Montani	f758804401	Save one line of code	2020-10-03 11:41:28 +02:00
svlandeg	02247cccaf	Merge remote-tracking branch 'upstream/develop' into feature/small-fixes	2020-10-02 20:48:11 +02:00
svlandeg	acc391c2a8	remove redundant str() call	2020-10-02 11:05:59 +02:00
Ines Montani	01c1538c72	Integrate file readers	2020-10-02 01:36:06 +02:00
Ines Montani	23c63eefaf	Tidy up env vars [ci skip]	2020-09-30 15:15:11 +02:00
Ines Montani	a5debb356d	Tidy up and adjust logging [ci skip]	2020-09-30 01:22:08 +02:00
Ines Montani	da30bae8a6	Use __pyx_vtable__ instead of __reduce_cython__	2020-09-29 22:04:17 +02:00
Ines Montani	fa47f87924	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
Ines Montani	d3c63b7965	Merge branch 'develop' into feature/prepare	2020-09-29 20:53:05 +02:00
Ines Montani	2be80379ec	Fix small issues, resolve_dot_names and debug model	2020-09-29 20:38:35 +02:00
Ines Montani	dba26186ef	Handle None default args in Cython methods	2020-09-29 18:08:02 +02:00
Ines Montani	9353a82076	Auto-format	2020-09-29 18:07:48 +02:00
Matthew Honnibal	4ad26f4a2f	Move reader	2020-09-29 16:54:53 +02:00
Ines Montani	2e9c9e74af	Fix config resolution and interpolation TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)	2020-09-28 15:34:00 +02:00
Ines Montani	02838a1d47	Fix resolve_dot_names	2020-09-28 15:27:10 +02:00
Ines Montani	822ea4ef61	Refactor CLI	2020-09-28 15:09:59 +02:00
Ines Montani	e44a7519cd	Update CLI and add [initialize] block	2020-09-28 11:56:14 +02:00
Ines Montani	d5155376fd	Update vocab init	2020-09-28 11:30:18 +02:00
Matthew Honnibal	65448b2e34	Remove schema=None until Optional	2020-09-28 03:42:58 +02:00
Matthew Honnibal	a023cf3ecc	Add (untested) resolve_dot_names util	2020-09-28 03:06:12 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Ines Montani	9016d23cc5	Fix exclude and add test	2020-09-27 23:34:03 +02:00
Ines Montani	7e938ed63e	Update config resolution to use new Thinc	2020-09-27 22:21:31 +02:00
Ines Montani	26e28ed413	Fix combined scores if multiple components report it	2020-09-24 17:11:13 +02:00
Ines Montani	d0ef4a4cf5	Prevent division by zero in score weights	2020-09-24 16:42:13 +02:00
Ines Montani	4bbe41f017	Fix combined scores and update test	2020-09-24 10:42:47 +02:00
Ines Montani	ae51f580c1	Fix handling of score_weights	2020-09-24 10:27:33 +02:00
Ines Montani	f25f05c503	Adjust sort order [ci skip]	2020-09-23 20:03:04 +02:00
Matthew Honnibal	8fb59d958c	Format	2020-09-20 16:31:48 +02:00
Matthew Honnibal	889128e5c5	Improve error handling in run_command	2020-09-20 16:20:57 +02:00
Adriane Boyd	47080fba98	Minor renaming / refactoring * Rename loader to `spacy.LookupsDataLoader.v1`, add debugging message * Make `Vocab.lookups` a property	2020-09-18 19:43:19 +02:00
Adriane Boyd	eed4b785f5	Load vocab lookups tables at beginning of training Similar to how vectors are handled, move the vocab lookups to be loaded at the start of training rather than when the vocab is initialized, since the vocab doesn't have access to the full config when it's created. The option moves from `nlp.load_vocab_data` to `training.lookups`. Typically these tables will come from `spacy-lookups-data`, but any `Lookups` object can be provided. The loading from `spacy-lookups-data` is now strict, so configs for each language should specify the exact tables required. This also makes it easier to control whether the larger clusters and probs tables are included. To load `lexeme_norm` from `spacy-lookups-data`: ``` [training.lookups] @misc = "spacy.LoadLookupsData.v1" lang = ${nlp.lang} tables = ["lexeme_norm"] ```	2020-09-18 15:59:16 +02:00
Ines Montani	c052017025	Fix sparse checkout and error handling	2020-09-14 14:12:58 +02:00
Ines Montani	416deb412f	Prevent duplicate traceback on CalledProcessError [ci skip]	2020-09-13 19:28:54 +02:00
Ines Montani	f8846c198d	Update types and docstrings	2020-09-13 10:52:02 +02:00
Ines Montani	3e83a509bb	WIP: fix project clone compatibility	2020-09-10 15:49:13 +02:00
Matthew Honnibal	b470062153	Add CLI registry (#6037 )	2020-09-08 15:23:34 +02:00
Ines Montani	5afe6447cd	registry.assets -> registry.misc	2020-09-03 17:31:14 +02:00
Ines Montani	45f46a5c85	Merge pull request #5993 from explosion/feature/disabled-components	2020-08-29 15:58:41 +02:00

1 2 3 4 5 ...

431 Commits