spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-16 20:20:41 +03:00

Author	SHA1	Message	Date
Ines Montani	c7c9b0451f	Update docs [ci skip]	2020-08-22 13:52:52 +02:00
Ines Montani	71aeae89c5	Merge pull request #5948 from svlandeg/feature/docs-docs-docs [ci skip]	2020-08-22 12:18:47 +02:00
Ines Montani	f102164a1f	Update docs [ci skip]	2020-08-21 19:34:06 +02:00
svlandeg	1b7cfa7347	Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs	2020-08-21 18:36:18 +02:00
svlandeg	dc98f69b57	alphabetize registries	2020-08-21 18:10:21 +02:00
svlandeg	518a1f97f3	remove outdated TODO's	2020-08-21 17:55:15 +02:00
svlandeg	e92bd6e1c1	alphabetize training lists	2020-08-21 17:42:19 +02:00
Ines Montani	74cb6d39d0	Update docs [ci skip]	2020-08-21 16:11:38 +02:00
Matthew Honnibal	f5bcc10268	Update architectures	2020-08-21 15:34:54 +02:00
Matthew Honnibal	7ed8f4504b	Update API docs for architectures	2020-08-21 15:22:19 +02:00
Ines Montani	52bd3a8b48	Update docs [ci skip]	2020-08-21 13:22:59 +02:00
Ines Montani	e60442d83a	Adjust label casing in displaCy NER visualizer (resolves #4866 ) - Accept any case for label names in ents and colors option, even if actual predicted label uses different casing - Don't text-transform: uppercase visually, if it's important to users that the label is represented as-is in the UI	2020-08-21 11:51:31 +02:00
Ines Montani	04e4d59235	Update docs [ci skip]	2020-08-20 16:17:25 +02:00
Sofie Van Landeghem	410b54e10e	Update website/docs/api/data-formats.md Co-authored-by: Ines Montani <ines@ines.io>	2020-08-20 11:15:34 +02:00
svlandeg	ae719b354f	fix typos	2020-08-20 10:20:40 +02:00
svlandeg	f728c00cbb	Merge remote-tracking branch 'upstream/develop' into feature/update-more-docs # Conflicts: # website/docs/api/data-formats.md	2020-08-20 10:02:13 +02:00
svlandeg	229033831a	add explanation of raw_text	2020-08-20 10:00:45 +02:00
Ines Montani	ea6640ea72	Merge pull request #5939 from explosion/feature/thinc-v8.0.0a28 Update Thinc and config variables	2020-08-19 21:14:36 +02:00
svlandeg	09f3cfc985	add version	2020-08-19 19:58:45 +02:00
svlandeg	7d9f00bdbf	waltzing schedule	2020-08-19 19:53:00 +02:00
Ines Montani	3dd390b1a1	Update Thinc and config variables	2020-08-19 19:46:12 +02:00
svlandeg	85b39639e1	small fix	2020-08-19 19:17:36 +02:00
svlandeg	169b5bcda0	Merge remote-tracking branch 'upstream/develop' into feature/update-docs # Conflicts: # website/docs/usage/training.md	2020-08-19 17:58:25 +02:00
svlandeg	7119295a8a	badgers intro	2020-08-19 17:53:22 +02:00
svlandeg	648499157a	rename "custom models" to "custom functions"	2020-08-19 16:53:51 +02:00
Ines Montani	63921161c8	Update docs [ci skip]	2020-08-19 16:04:21 +02:00
svlandeg	60fedb8518	fix 2 more API lines	2020-08-19 14:55:32 +02:00
svlandeg	2dfd919585	add kb_loader and get_candidates back to EL API	2020-08-19 14:52:49 +02:00
Ines Montani	225f8866a1	Fix consistency	2020-08-19 12:47:57 +02:00
Ines Montani	2285e59765	Merge pull request #5933 from svlandeg/feature/more-v3-docs [ci skip]	2020-08-19 11:29:02 +02:00
Ines Montani	13291e97ba	Update docs [ci skip]	2020-08-19 00:28:37 +02:00
svlandeg	0d55b6ebb4	formatting	2020-08-18 18:55:56 +02:00
svlandeg	abba639565	Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs	2020-08-18 18:55:12 +02:00
Sofie Van Landeghem	358cbb21e3	Define candidate generator in EL config (#5876 ) * candidate generator as separate part of EL config * update comment * ent instead of str as input for candidate generation * Span instead of str: correct type indication * fix types * unit test to create new candidate generator * fix replace_pipe argument passing * move error message, general cleanup * add vocab back to KB constructor * provide KB as callable from Vocab arg * rename to kb_loader, fix KB serialization as part of the EL pipe * fix typo * reformatting * cleanup * fix comment * fix wrongly duplicated code from merge conflict * rename dump to to_disk * from_disk instead of load_bulk * update test after recent removal of set_morphology in tagger * remove old doc	2020-08-18 16:10:36 +02:00
Ines Montani	82f0e20318	Update docs and consistency [ci skip]	2020-08-18 14:39:40 +02:00
svlandeg	705e1cb06c	typo in link	2020-08-18 12:04:05 +02:00
svlandeg	f7b76d2d83	Merge remote-tracking branch 'upstream/develop' into feature/more-v3-docs	2020-08-18 11:57:52 +02:00
Ines Montani	1c3bcfb488	Update docs and util consistency	2020-08-18 01:22:59 +02:00
Ines Montani	728fec0194	Update docs [ci skip]	2020-08-18 00:49:19 +02:00
Ines Montani	990c6b4c32	Update docs and CLI [ci skip]	2020-08-17 21:38:20 +02:00
svlandeg	4fe4bab1c9	typo fixes	2020-08-17 17:10:15 +02:00
svlandeg	da80c18660	merge develop into branch	2020-08-17 16:57:18 +02:00
Ines Montani	3ae5e02f4f	Update docs, types and API consistency	2020-08-17 16:45:24 +02:00
svlandeg	319692aa53	fix typos	2020-08-17 14:05:48 +02:00
Ines Montani	2ac4b0ef3e	Finish Transformer docs [ci skip]	2020-08-16 15:56:32 +02:00
Ines Montani	6ae83bde0c	Fix CLI consistency [ci skip]	2020-08-16 15:46:29 +02:00
Ines Montani	a570c304df	Update quickstart, template and docs	2020-08-15 14:50:29 +02:00
Ines Montani	950832f087	Tidy up pipes (#5906 ) * Tidy up pipes * Fix init, defaults and raise custom errors * Update docs * Update docs [ci skip] * Apply suggestions from code review Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> * Tidy up error handling and validation, fix consistency * Simplify get_examples check * Remove unused import [ci skip] Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-11 23:29:31 +02:00
Ines Montani	b7ec06e331	Update docs [ci skip]	2020-08-11 20:57:23 +02:00
Ines Montani	023ba7ae26	Update docs	2020-08-10 17:13:11 +02:00
Ines Montani	c099f6eece	Add Token.lex	2020-08-10 16:43:52 +02:00
Ines Montani	64f2f84098	Update docstrings and docs [ci skip]	2020-08-10 13:45:22 +02:00
Ines Montani	12052bd8f6	Update docs [ci skip]	2020-08-10 01:20:10 +02:00
Ines Montani	0832cdd443	Fix formatting [ci skip]	2020-08-10 00:46:32 +02:00
Ines Montani	d611cbef43	Update docs [ci skip]	2020-08-10 00:42:26 +02:00
Ines Montani	c044460823	Update docs [ci skip]	2020-08-10 00:01:38 +02:00
Ines Montani	d5c78c7a34	Update docs and fix consistency	2020-08-09 22:31:52 +02:00
Ines Montani	a15c5fb191	Update docstrings and docs	2020-08-09 16:10:48 +02:00
Ines Montani	46bc513a4e	Update docs [ci skip]	2020-08-07 20:14:31 +02:00
Ines Montani	fe29ceec9e	Merge branch 'develop' into docs/model-docstrings	2020-08-07 18:42:01 +02:00
Ines Montani	470b6f8073	Update docs	2020-08-07 18:41:15 +02:00
Ines Montani	b7e34c1451	Update docs [ci skip]	2020-08-07 16:13:13 +02:00
Ines Montani	6f3649923c	Merge pull request #5893 from explosion/feature/validate-arg	2020-08-07 15:47:20 +02:00
Adriane Boyd	e962784531	Add Lemmatizer and simplify related components (#5848 ) * Add Lemmatizer and simplify related components * Add `Lemmatizer` pipe with `lookup` and `rule` modes using the `Lookups` tables. * Reduce `Tagger` to a simple tagger that sets `Token.tag` (no pos or lemma) * Reduce `Morphology` to only keep track of morph tags (no tag map, lemmatizer, or morph rules) * Remove lemmatizer from `Vocab` * Adjust many many tests Differences: * No default lookup lemmas * No special treatment of TAG in `from_array` and similar required * Easier to modify labels in a `Tagger` * No extra strings added from morphology / tag map * Fix test * Initial fix for Lemmatizer config/serialization * Adjust init test to be more generic * Adjust init test to force empty Lookups * Add simple cache to rule-based lemmatizer * Convert language-specific lemmatizers Convert language-specific lemmatizers to component lemmatizers. Remove previous lemmatizer class. * Fix French and Polish lemmatizers * Remove outdated UPOS conversions * Update Russian lemmatizer init in tests * Add minimal init/run tests for custom lemmatizers * Add option to overwrite existing lemmas * Update mode setting, lookup loading, and caching * Make `mode` an immutable property * Only enforce strict `load_lookups` for known supported modes * Move caching into individual `_lemmatize` methods * Implement strict when lang is not found in lookups * Fix tables/lookups in make_lemmatizer * Reallow provided lookups and allow for stricter checks * Add lookups asset to all Lemmatizer pipe tests * Rename lookups in lemmatizer init test * Clean up merge * Refactor lookup table loading * Add helper from `load_lemmatizer_lookups` that loads required and optional lookups tables based on settings provided by a config. Additional slight refactor of lookups: * Add `Lookups.set_table` to set a table from a provided `Table` * Reorder class definitions to be able to specify type as `Table` * Move registry assets into test methods * Refactor lookups tables config Use class methods within `Lemmatizer` to provide the config for particular modes and to load the lookups from a config. * Add pipe and score to lemmatizer * Simplify Tagger.score * Add missing import * Clean up imports and auto-format * Remove unused kwarg * Tidy up and auto-format * Update docstrings for Lemmatizer Update docstrings for Lemmatizer. Additionally modify `is_base_form` API to take `Token` instead of individual features. * Update docstrings * Remove tag map values from Tagger.add_label * Update API docs * Fix relative link in Lemmatizer API docs	2020-08-07 15:27:13 +02:00
Adriane Boyd	4aecccf153	Update API docs for AttributeRuler.__init__	2020-08-07 15:17:25 +02:00
Ines Montani	a8404c3517	validation -> validate	2020-08-07 14:43:47 +02:00
Ines Montani	1d01d89b79	Update CLI docs and evaluate command [ci skip]	2020-08-07 14:40:58 +02:00
Ines Montani	ef2c67cca5	Add DocBin to/from_disk methods and update docs (#5892 ) * Add DocBin to/from_disk methods and update docs * Use DocBin.from_disk in Corpus	2020-08-07 14:30:59 +02:00
Ines Montani	4ca08c6d5d	Merge pull request #5891 from adrianeboyd/docs/attribute-ruler-api Add AttributeRuler API docs	2020-08-07 13:55:12 +02:00
Adriane Boyd	b8d0c23857	Add AttributeRuler API docs With additional minor updates to AttributeRuler docstrings.	2020-08-07 12:43:23 +02:00
svlandeg	824f4b2107	casing consistent	2020-08-06 23:20:13 +02:00
svlandeg	b17db0e994	Merge remote-tracking branch 'upstream/develop' into feature/el-docs # Conflicts: # website/docs/usage/training.md	2020-08-06 19:48:52 +02:00
svlandeg	49ddeb99ea	add textcat architectures documentation	2020-08-06 19:44:47 +02:00
Ines Montani	e5995904d6	Update docs	2020-08-06 19:30:43 +02:00
svlandeg	e8fd0c1f1e	EL architectures documentation	2020-08-06 17:41:26 +02:00
svlandeg	f396f091dc	update EL API	2020-08-06 16:40:48 +02:00
svlandeg	81d0b1c390	update EL pipe arguments	2020-08-06 16:22:50 +02:00
svlandeg	0b4d1e1bc4	'debug data' instead of 'debug-data'	2020-08-06 15:47:31 +02:00
svlandeg	881e3f8fd0	add docbin explanation and example	2020-08-06 15:29:44 +02:00
Ines Montani	5d417d3b19	WIP: Update docs [ci skip]	2020-08-06 13:10:15 +02:00
Ines Montani	06e80d95cd	Sync develop with nightly docs state (#5883 ) Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2020-08-06 00:28:14 +02:00
Ines Montani	5cc0d89fad	Simplify config overrides in CLI and deserialization (#5880 )	2020-08-05 23:35:09 +02:00
Ines Montani	50311a4d37	Update docs [ci skip]	2020-08-05 20:29:53 +02:00
Ines Montani	2a4d56e730	Update docs	2020-08-05 15:01:00 +02:00
Ines Montani	cdec46493f	Update docs	2020-08-05 15:00:54 +02:00
Adriane Boyd	c62fd878a3	Allow Doc.char_span to snap to token boundaries (#5849 ) * Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`.	2020-08-04 13:36:32 +02:00
Ines Montani	4c055f0aa7	Add init CLI and init config (#5854 ) * Add init CLI and init config draft * Improve config validation * Auto-format * Don't export anything in debug config * Update docs	2020-08-02 15:18:30 +02:00
Ines Montani	b40f44419b	Simplify pipe analysis - remove unused code - don't print by default - integrate attrs info into analysis output	2020-08-01 13:40:06 +02:00
Ines Montani	98c6a85c8b	Update docs [ci skip]	2020-07-31 18:55:38 +02:00
Ines Montani	e9e8fa2466	Update docs and types	2020-07-31 17:02:54 +02:00
Ines Montani	5a221f79c2	Revert "Remove keyword-only from Scorer API docs" [ci skip] This reverts commit `7a6ac47dc1`.	2020-07-31 14:00:21 +02:00
Adriane Boyd	9b509aa87f	Move Language.evaluate scorer config to new arg Move `Language.evaluate` scorer config from `component_cfg` to separate argument `scorer_cfg`.	2020-07-31 11:05:16 +02:00
Adriane Boyd	9d79916792	Merge branch 'develop' into feature/scorer-adjustments	2020-07-31 10:48:14 +02:00
Ines Montani	9c80cb673d	Update docs [ci skip]	2020-07-29 19:41:34 +02:00
Ines Montani	9f69afdd1e	Update docs [ci skip]	2020-07-29 19:09:44 +02:00
Ines Montani	6a5c853edb	Fix docs [ci skip]	2020-07-29 18:45:12 +02:00
Ines Montani	158d8c1e48	Update docs [ci skip]	2020-07-29 18:44:10 +02:00
Ines Montani	b0f57a0cac	Update docs and consistency	2020-07-29 15:14:07 +02:00
Ines Montani	e0ffe36e79	Update docstrings, docs and types	2020-07-29 11:36:42 +02:00
Adriane Boyd	7a6ac47dc1	Remove keyword-only from Scorer API docs	2020-07-29 10:40:30 +02:00

1 2 3 4 5 ...

542 Commits