spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-02 13:36:18 +03:00

Author	SHA1	Message	Date
Adriane Boyd	39ebcd9ec9	Refactor Chinese tokenizer configuration (#5736 ) * Refactor Chinese tokenizer configuration Refactor `ChineseTokenizer` configuration so that it uses a single `segmenter` setting to choose between character segmentation, jieba, and pkuseg. * replace `use_jieba`, `use_pkuseg`, `require_pkuseg` with the setting `segmenter` with the supported values: `char`, `jieba`, `pkuseg` * make the default segmenter plain character segmentation `char` (no additional libraries required) * Fix Chinese serialization test to use char default * Warn if attempting to customize other segmenter Add a warning if `Chinese.pkuseg_update_user_dict` is called when another segmenter is selected.	2020-07-19 13:34:37 +02:00
Adriane Boyd	9ee1c54f40	Improve tag map initialization and updating (#5764 ) * Improve tag map initialization and updating Generalize tag map initialization and updating so that the tag map can be loaded correctly prior to loading a `Corpus` with `spacy debug-data` and `spacy train`. * normalize provided tag map as necessary * use the same method for initializing and updating the tag map * Replace rather than update tag map Replace rather than update tag map when loading a custom tag map. Updating the tag map is problematic due to the sorted list of tag names and the fact that the tag map will contain lingering/unwanted tags from the default tag map. * Update CLI scripts * Reinitialize cache after loading new tag map Reinitialize the cache with the right size after loading a new tag map.	2020-07-19 13:13:57 +02:00
Adriane Boyd	b81a89f0a9	Update morphologizer (#5766 ) * update `Morphologizer.begin_training` for use with `Example` * make init and begin_training more consistent * add `Morphology.normalize_features` to normalize outside of `Morphology.add` * make sure `get_loss` doesn't create unknown labels when the POS and morph alignments differ	2020-07-19 11:10:51 +02:00
Sofie Van Landeghem	38b59d728d	Upgrade of UD eval script (#5776 ) * new morph feature format * add new languages with tokenization * update with all new pretrained models	2020-07-19 11:10:31 +02:00
Ines Montani	68fade8f76	Add Plausible [ci skip]	2020-07-19 00:02:29 +02:00
Adriane Boyd	a7a7e0d2a6	Add morph to morphology in Doc.from_array (#5762 ) * Add morph to morphology in Doc.from_array Add morphological analyses to morphology table in `Doc.from_array`. * Use separate vocab in DocBin roundtrip test	2020-07-14 14:07:35 +02:00
Ines Montani	872938ec76	Merge pull request #5747 from explosion/feature/refactor-config-args	2020-07-14 00:00:22 +02:00
Sofie Van Landeghem	6f3bb6f77c	fix doc.to_utf8 on GPU (#5757 )	2020-07-13 23:05:33 +02:00
Ines Montani	dcfa910e4e	Merge pull request #5752 from explosion/compat/remove-object-subclass	2020-07-12 16:37:04 +02:00
Ines Montani	ed55143c0d	Merge branch 'develop' into compat/remove-object-subclass	2020-07-12 14:28:52 +02:00
Ines Montani	7906ddd56c	Fix test	2020-07-12 14:28:34 +02:00
Ines Montani	5f6f4ff594	Remove object subclassing	2020-07-12 14:03:23 +02:00
Ines Montani	c96535e338	Update command docstrings and docs	2020-07-12 13:53:49 +02:00
Ines Montani	0ab483037c	Make debug commands subcommands of spacy debug Also handle backwards-compatibility so the old commands don't break	2020-07-12 13:53:41 +02:00
Ines Montani	3f948b9c74	Update docs	2020-07-12 12:32:28 +02:00
Ines Montani	8a67ddd6f1	Remove unused import	2020-07-12 12:32:24 +02:00
Ines Montani	d1d7fd5f5d	Don't use file paths in schemas It should be possible to validate top-level config with file paths that don't exist	2020-07-12 12:32:08 +02:00
Ines Montani	79346853aa	Add debug-config command	2020-07-12 12:31:17 +02:00
Ines Montani	3a8632c3fb	Hide command from public --help for now Not sure we want this to be officially documented yet?	2020-07-11 19:21:22 +02:00
Ines Montani	5e683d03fe	Allow extra args on pretrain and debug_data	2020-07-11 19:17:59 +02:00
Ines Montani	70abcca60e	Update Thinc pin	2020-07-11 17:02:54 +02:00
Ines Montani	b7111da1d7	Update config and commands	2020-07-11 13:03:53 +02:00
Ines Montani	11bbc82c24	Update cli.md [ci skip]	2020-07-10 23:37:52 +02:00
Ines Montani	9e48ea48a1	Update Thinc pin	2020-07-10 23:34:57 +02:00
Ines Montani	f99ce7fbfb	Make validation errors more elegant	2020-07-10 23:34:17 +02:00
Ines Montani	9455b060d2	Update cli.md	2020-07-10 22:57:22 +02:00
Ines Montani	7b5717cac3	Merge branch 'develop' into feature/refactor-config-args	2020-07-10 22:50:07 +02:00
Ines Montani	e6a6587a9a	Update projects.md [ci skip]	2020-07-10 22:41:27 +02:00
Matthew Honnibal	743f7fb73a	Set version to v3.0.0a4	2020-07-10 22:40:12 +02:00
Matthew Honnibal	b68216e263	Explicitly delete objects after parser.update to free GPU memory (#5748 ) * Try explicitly deleting objects * Refactor parser model backprop slightly * Free parser data explicitly after rehearse and update	2020-07-10 22:35:20 +02:00
Ines Montani	f2cd982e7b	Update training.md	2020-07-10 22:34:27 +02:00
Ines Montani	fb6f6f584e	Replace - with _ in command names We might as well be nice if user accidentally types --training.use-gpu	2020-07-10 22:34:22 +02:00
Ines Montani	bfa8e11ffa	Update and auto-format	2020-07-10 20:52:00 +02:00
Ines Montani	0389c34b81	Merge branch 'develop' into feature/refactor-config-args	2020-07-10 20:51:52 +02:00
Ines Montani	931250e1f5	Fix pipeline component schema	2020-07-10 20:32:53 +02:00
Ines Montani	9fe1fa88ad	Fix typo	2020-07-10 20:32:37 +02:00
Ines Montani	459c6aa8f0	Merge branch 'feature/refactor-config-args' of https://github.com/explosion/spaCy into feature/refactor-config-args	2020-07-10 20:01:28 +02:00
Ines Montani	defe1e7213	Pretty-print config validation errors	2020-07-10 20:01:20 +02:00
Matthew Honnibal	894f31226b	Update config	2020-07-10 19:59:12 +02:00
Sofie Van Landeghem	de6a32315c	debug-model script (#5749 ) * adding debug-model to print the internals for debugging purposes * expend debug-model script with 4 stages: before, init, train, predict * avoid enforcing to have a seed in the train script * small fixes	2020-07-10 19:47:53 +02:00
Ines Montani	a3667394b4	Integrate with latest Thinc and config overrides	2020-07-10 19:47:05 +02:00
Ines Montani	5cfc3edcaa	Update CLI tests	2020-07-10 18:21:01 +02:00
Ines Montani	3583ea84d8	Update arg parsing	2020-07-10 18:20:52 +02:00
Ines Montani	73332ddb67	Update CLI commans to use one shared util file	2020-07-10 17:57:40 +02:00
Ines Montani	240e0a62ca	Update with WIP	2020-07-10 13:31:27 +02:00
Ines Montani	a60562f208	Update project CLI hashes, directories, skipping (#5741 ) * Update project CLI hashes, directories, skipping * Improve clone success message * Remove unused context args * Move project-specific utils to project utils The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util * Improve run help and add workflows * Add note re: directory checksum speed * Fix cloning from subdirectories and output messages * Remove hard-coded dirs	2020-07-09 23:51:18 +02:00
Ines Montani	e624fcd5d9	Merge branch 'nightly.spacy.io' into develop	2020-07-09 23:26:26 +02:00
Ines Montani	52e9b5b472	Fix formatting	2020-07-09 23:25:58 +02:00
Ines Montani	28cdae898a	Update projects.md	2020-07-09 22:35:54 +02:00
Ines Montani	7bcf9f7cfb	Document new features	2020-07-09 21:10:36 +02:00

1 2 3 4 5 ...

12224 Commits