spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-14 21:57:15 +03:00

Author	SHA1	Message	Date
Ines Montani	9b86312bab	Update docs [ci skip]	2020-08-29 18:43:19 +02:00
Adriane Boyd	870774f475	Merge branch 'develop' into docs/morph-usage-v3	2020-08-29 16:00:50 +02:00
Ines Montani	45f46a5c85	Merge pull request #5993 from explosion/feature/disabled-components	2020-08-29 15:58:41 +02:00
Adriane Boyd	f9ed31a757	Update usage docs for lemmatization and morphology	2020-08-29 15:56:50 +02:00
Ines Montani	450bf806b0	Merge pull request #5991 from adrianeboyd/docs/sent-usage-v3 Update sentence segmentation usage docs	2020-08-29 12:40:06 +02:00
Ines Montani	66d76f5126	Update docs	2020-08-29 12:36:05 +02:00
svlandeg	5230529de2	add loggers registry & logger docs sections	2020-08-28 21:44:04 +02:00
Adriane Boyd	48df50533d	Update sentence segmentation usage docs Update sentence segmentation usage docs to incorporate `senter`.	2020-08-28 10:58:16 +02:00
svlandeg	72a87095d9	add loggers registry	2020-08-27 20:26:28 +02:00
svlandeg	aa9e0c9c39	small fix	2020-08-27 19:56:52 +02:00
svlandeg	8cde6ccb7d	Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs	2020-08-27 19:56:09 +02:00
svlandeg	556e975a30	various fixes	2020-08-27 19:24:44 +02:00
Ines Montani	ff4175e839	Add more info to debug config	2020-08-27 18:17:58 +02:00
svlandeg	559b65f2e0	adjust references to null_annotation_setter to trfdata_setter	2020-08-27 09:43:32 +02:00
Ines Montani	696f167478	Add diff example to docs [ci skip]	2020-08-26 15:57:54 +02:00
Adriane Boyd	90d88729e0	Add AttributeRuler.score (#5963 ) * Add AttributeRuler.score Add scoring for TAG / POS / MORPH / LEMMA if these are present in the assigned token attributes. Add default score weights (that don't really make a lot of sense) so that the scores are in the default config in some form. * Update docs	2020-08-26 15:39:30 +02:00
svlandeg	ec069627fe	rename to TransformerListener	2020-08-26 13:31:01 +02:00
Ines Montani	627617a079	Tidy up and add docs [ci skip]	2020-08-26 13:24:55 +02:00
svlandeg	15902c5aa2	fix link	2020-08-26 11:51:57 +02:00
svlandeg	feb86d5206	clarify default	2020-08-26 11:21:30 +02:00
Ines Montani	8ac5ef1284	Update docs	2020-08-25 11:54:37 +02:00
Matthew Honnibal	e559867605	Allow spacy project to push and pull to/from remote storage (#5949 ) * Add utils for working with remote storage * WIP add remote_cache for project * WIP add push and pull commands * Use pathy in remote_cache * Updarte util * Update remote_cache * Update util * Update project assets * Update pull script * Update push script * Fix type annotation in util * Work on remote storage * Remove site and env hash * Fix imports * Fix type annotation * Require pathy * Require pathy * Fix import * Add a util to handle project variable substitution * Import push and pull commands * Fix pull command * Fix push command * Fix tarfile in remote_storage * Improve printing * Fiddle with status messages * Set version to v3.0.0a9 * Draft docs for spacy project remote storages * Update docs [ci skip] * Use Thinc config to simplify and unify template variables * Auto-format * Don't import Pathy globally for now Causes slow and annoying Google Cloud warning * Tidy up test * Tidy up and update tests * Update to latest Thinc * Update docs * variables -> vars * Update docs [ci skip] * Update docs [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2020-08-23 18:32:09 +02:00
Ines Montani	c7c9b0451f	Update docs [ci skip]	2020-08-22 13:52:52 +02:00
Ines Montani	71aeae89c5	Merge pull request #5948 from svlandeg/feature/docs-docs-docs [ci skip]	2020-08-22 12:18:47 +02:00
Ines Montani	f102164a1f	Update docs [ci skip]	2020-08-21 19:34:06 +02:00
svlandeg	1b7cfa7347	Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs	2020-08-21 18:36:18 +02:00
svlandeg	dc98f69b57	alphabetize registries	2020-08-21 18:10:21 +02:00
svlandeg	518a1f97f3	remove outdated TODO's	2020-08-21 17:55:15 +02:00
svlandeg	e92bd6e1c1	alphabetize training lists	2020-08-21 17:42:19 +02:00
Ines Montani	74cb6d39d0	Update docs [ci skip]	2020-08-21 16:11:38 +02:00
Matthew Honnibal	f5bcc10268	Update architectures	2020-08-21 15:34:54 +02:00
Matthew Honnibal	7ed8f4504b	Update API docs for architectures	2020-08-21 15:22:19 +02:00
Ines Montani	52bd3a8b48	Update docs [ci skip]	2020-08-21 13:22:59 +02:00
Ines Montani	e60442d83a	Adjust label casing in displaCy NER visualizer (resolves #4866 ) - Accept any case for label names in ents and colors option, even if actual predicted label uses different casing - Don't text-transform: uppercase visually, if it's important to users that the label is represented as-is in the UI	2020-08-21 11:51:31 +02:00
Ines Montani	04e4d59235	Update docs [ci skip]	2020-08-20 16:17:25 +02:00
Sofie Van Landeghem	410b54e10e	Update website/docs/api/data-formats.md Co-authored-by: Ines Montani <ines@ines.io>	2020-08-20 11:15:34 +02:00
svlandeg	ae719b354f	fix typos	2020-08-20 10:20:40 +02:00
svlandeg	f728c00cbb	Merge remote-tracking branch 'upstream/develop' into feature/update-more-docs # Conflicts: # website/docs/api/data-formats.md	2020-08-20 10:02:13 +02:00
svlandeg	229033831a	add explanation of raw_text	2020-08-20 10:00:45 +02:00
Ines Montani	ea6640ea72	Merge pull request #5939 from explosion/feature/thinc-v8.0.0a28 Update Thinc and config variables	2020-08-19 21:14:36 +02:00
svlandeg	09f3cfc985	add version	2020-08-19 19:58:45 +02:00
svlandeg	7d9f00bdbf	waltzing schedule	2020-08-19 19:53:00 +02:00
Ines Montani	3dd390b1a1	Update Thinc and config variables	2020-08-19 19:46:12 +02:00
svlandeg	85b39639e1	small fix	2020-08-19 19:17:36 +02:00
svlandeg	169b5bcda0	Merge remote-tracking branch 'upstream/develop' into feature/update-docs # Conflicts: # website/docs/usage/training.md	2020-08-19 17:58:25 +02:00
svlandeg	7119295a8a	badgers intro	2020-08-19 17:53:22 +02:00
svlandeg	648499157a	rename "custom models" to "custom functions"	2020-08-19 16:53:51 +02:00
Ines Montani	63921161c8	Update docs [ci skip]	2020-08-19 16:04:21 +02:00
svlandeg	60fedb8518	fix 2 more API lines	2020-08-19 14:55:32 +02:00
svlandeg	2dfd919585	add kb_loader and get_candidates back to EL API	2020-08-19 14:52:49 +02:00
Ines Montani	225f8866a1	Fix consistency	2020-08-19 12:47:57 +02:00
Ines Montani	2285e59765	Merge pull request #5933 from svlandeg/feature/more-v3-docs [ci skip]	2020-08-19 11:29:02 +02:00
Ines Montani	13291e97ba	Update docs [ci skip]	2020-08-19 00:28:37 +02:00
svlandeg	0d55b6ebb4	formatting	2020-08-18 18:55:56 +02:00
svlandeg	abba639565	Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs	2020-08-18 18:55:12 +02:00
Sofie Van Landeghem	358cbb21e3	Define candidate generator in EL config (#5876 ) * candidate generator as separate part of EL config * update comment * ent instead of str as input for candidate generation * Span instead of str: correct type indication * fix types * unit test to create new candidate generator * fix replace_pipe argument passing * move error message, general cleanup * add vocab back to KB constructor * provide KB as callable from Vocab arg * rename to kb_loader, fix KB serialization as part of the EL pipe * fix typo * reformatting * cleanup * fix comment * fix wrongly duplicated code from merge conflict * rename dump to to_disk * from_disk instead of load_bulk * update test after recent removal of set_morphology in tagger * remove old doc	2020-08-18 16:10:36 +02:00
Ines Montani	82f0e20318	Update docs and consistency [ci skip]	2020-08-18 14:39:40 +02:00
svlandeg	705e1cb06c	typo in link	2020-08-18 12:04:05 +02:00
svlandeg	f7b76d2d83	Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs	2020-08-18 11:57:52 +02:00
Ines Montani	1c3bcfb488	Update docs and util consistency	2020-08-18 01:22:59 +02:00
Ines Montani	728fec0194	Update docs [ci skip]	2020-08-18 00:49:19 +02:00
Ines Montani	990c6b4c32	Update docs and CLI [ci skip]	2020-08-17 21:38:20 +02:00
svlandeg	4fe4bab1c9	typo fixes	2020-08-17 17:10:15 +02:00
svlandeg	da80c18660	merge develop into branch	2020-08-17 16:57:18 +02:00
Ines Montani	3ae5e02f4f	Update docs, types and API consistency	2020-08-17 16:45:24 +02:00
svlandeg	319692aa53	fix typos	2020-08-17 14:05:48 +02:00
Ines Montani	2ac4b0ef3e	Finish Transformer docs [ci skip]	2020-08-16 15:56:32 +02:00
Ines Montani	6ae83bde0c	Fix CLI consistency [ci skip]	2020-08-16 15:46:29 +02:00
Ines Montani	a570c304df	Update quickstart, template and docs	2020-08-15 14:50:29 +02:00
Ines Montani	950832f087	Tidy up pipes (#5906 ) * Tidy up pipes * Fix init, defaults and raise custom errors * Update docs * Update docs [ci skip] * Apply suggestions from code review Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> * Tidy up error handling and validation, fix consistency * Simplify get_examples check * Remove unused import [ci skip] Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-11 23:29:31 +02:00
Ines Montani	b7ec06e331	Update docs [ci skip]	2020-08-11 20:57:23 +02:00
Ines Montani	023ba7ae26	Update docs	2020-08-10 17:13:11 +02:00
Ines Montani	c099f6eece	Add Token.lex	2020-08-10 16:43:52 +02:00
Ines Montani	64f2f84098	Update docstrings and docs [ci skip]	2020-08-10 13:45:22 +02:00
Ines Montani	12052bd8f6	Update docs [ci skip]	2020-08-10 01:20:10 +02:00
Ines Montani	0832cdd443	Fix formatting [ci skip]	2020-08-10 00:46:32 +02:00
Ines Montani	d611cbef43	Update docs [ci skip]	2020-08-10 00:42:26 +02:00
Ines Montani	c044460823	Update docs [ci skip]	2020-08-10 00:01:38 +02:00
Ines Montani	d5c78c7a34	Update docs and fix consistency	2020-08-09 22:31:52 +02:00
Ines Montani	a15c5fb191	Update docstrings and docs	2020-08-09 16:10:48 +02:00
Ines Montani	46bc513a4e	Update docs [ci skip]	2020-08-07 20:14:31 +02:00
Ines Montani	fe29ceec9e	Merge branch 'develop' into docs/model-docstrings	2020-08-07 18:42:01 +02:00
Ines Montani	470b6f8073	Update docs	2020-08-07 18:41:15 +02:00
Ines Montani	b7e34c1451	Update docs [ci skip]	2020-08-07 16:13:13 +02:00
Ines Montani	6f3649923c	Merge pull request #5893 from explosion/feature/validate-arg	2020-08-07 15:47:20 +02:00
Adriane Boyd	e962784531	Add Lemmatizer and simplify related components (#5848 ) * Add Lemmatizer and simplify related components * Add `Lemmatizer` pipe with `lookup` and `rule` modes using the `Lookups` tables. * Reduce `Tagger` to a simple tagger that sets `Token.tag` (no pos or lemma) * Reduce `Morphology` to only keep track of morph tags (no tag map, lemmatizer, or morph rules) * Remove lemmatizer from `Vocab` * Adjust many many tests Differences: * No default lookup lemmas * No special treatment of TAG in `from_array` and similar required * Easier to modify labels in a `Tagger` * No extra strings added from morphology / tag map * Fix test * Initial fix for Lemmatizer config/serialization * Adjust init test to be more generic * Adjust init test to force empty Lookups * Add simple cache to rule-based lemmatizer * Convert language-specific lemmatizers Convert language-specific lemmatizers to component lemmatizers. Remove previous lemmatizer class. * Fix French and Polish lemmatizers * Remove outdated UPOS conversions * Update Russian lemmatizer init in tests * Add minimal init/run tests for custom lemmatizers * Add option to overwrite existing lemmas * Update mode setting, lookup loading, and caching * Make `mode` an immutable property * Only enforce strict `load_lookups` for known supported modes * Move caching into individual `_lemmatize` methods * Implement strict when lang is not found in lookups * Fix tables/lookups in make_lemmatizer * Reallow provided lookups and allow for stricter checks * Add lookups asset to all Lemmatizer pipe tests * Rename lookups in lemmatizer init test * Clean up merge * Refactor lookup table loading * Add helper from `load_lemmatizer_lookups` that loads required and optional lookups tables based on settings provided by a config. Additional slight refactor of lookups: * Add `Lookups.set_table` to set a table from a provided `Table` * Reorder class definitions to be able to specify type as `Table` * Move registry assets into test methods * Refactor lookups tables config Use class methods within `Lemmatizer` to provide the config for particular modes and to load the lookups from a config. * Add pipe and score to lemmatizer * Simplify Tagger.score * Add missing import * Clean up imports and auto-format * Remove unused kwarg * Tidy up and auto-format * Update docstrings for Lemmatizer Update docstrings for Lemmatizer. Additionally modify `is_base_form` API to take `Token` instead of individual features. * Update docstrings * Remove tag map values from Tagger.add_label * Update API docs * Fix relative link in Lemmatizer API docs	2020-08-07 15:27:13 +02:00
Adriane Boyd	4aecccf153	Update API docs for AttributeRuler.__init__	2020-08-07 15:17:25 +02:00
Ines Montani	a8404c3517	validation -> validate	2020-08-07 14:43:47 +02:00
Ines Montani	1d01d89b79	Update CLI docs and evaluate command [ci skip]	2020-08-07 14:40:58 +02:00
Ines Montani	ef2c67cca5	Add DocBin to/from_disk methods and update docs (#5892 ) * Add DocBin to/from_disk methods and update docs * Use DocBin.from_disk in Corpus	2020-08-07 14:30:59 +02:00
Ines Montani	4ca08c6d5d	Merge pull request #5891 from adrianeboyd/docs/attribute-ruler-api Add AttributeRuler API docs	2020-08-07 13:55:12 +02:00
Adriane Boyd	b8d0c23857	Add AttributeRuler API docs With additional minor updates to AttributeRuler docstrings.	2020-08-07 12:43:23 +02:00
svlandeg	824f4b2107	casing consistent	2020-08-06 23:20:13 +02:00
svlandeg	b17db0e994	Merge remote-tracking branch 'upstream/develop' into feature/el-docs # Conflicts: # website/docs/usage/training.md	2020-08-06 19:48:52 +02:00
svlandeg	49ddeb99ea	add textcat architectures documentation	2020-08-06 19:44:47 +02:00
Ines Montani	e5995904d6	Update docs	2020-08-06 19:30:43 +02:00
svlandeg	e8fd0c1f1e	EL architectures documentation	2020-08-06 17:41:26 +02:00
svlandeg	f396f091dc	update EL API	2020-08-06 16:40:48 +02:00
svlandeg	81d0b1c390	update EL pipe arguments	2020-08-06 16:22:50 +02:00
svlandeg	0b4d1e1bc4	'debug data' instead of 'debug-data'	2020-08-06 15:47:31 +02:00

1 2 3 4 5 ...

564 Commits