spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-12 05:04:15 +03:00

Author	SHA1	Message	Date
Matthew Honnibal	8038b87f04	Various small tweaks to project CLI (#5965 ) * Fix up/download of http and local paths * Support git_sparse_checkout for assets * Fix scorer * Handle already-present directories for git assets * Improve convert command * Fix support for existant files in git assets * Support branches in git sparse checkout * Format * Fix git assets * Document git block in assets * Fix test * Fix test * Revert "Fix test" This reverts commit `cf3097260f`. * Revert "Fix test" This reverts commit `964d636e27`. * Dont multiply p/r/f by 100 * Display scores * 100 during training	2020-08-25 00:30:52 +02:00
Adriane Boyd	abd3f2b65a	Rename Polish lemmatizer method (#5960 ) Rename Polish lemmatizer method to `pos_lookup` to distinguish it from pure token-based lookup methods.	2020-08-25 00:22:27 +02:00
Ines Montani	e12b03358b	Support removing extra values in fill-config (#5966 ) * Support removing extra values in fill-config * Fix test	2020-08-24 22:53:47 +02:00
Matthew Honnibal	f232d8db96	Report p/r/f out of 100	2020-08-24 17:17:23 +02:00
Ines Montani	0e7f99da58	Fix handling of optional [pretraining] block (#5954 ) * Fix handling of optional [pretraining] block * Remote pretraining from default config * Fix test * Add schema option for empty pretrain block	2020-08-24 15:56:03 +02:00
Matthew Honnibal	64df37643f	Update lockfile after project pull	2020-08-24 03:27:09 +02:00
Matthew Honnibal	588c28fe45	Fix project pull when deps missing	2020-08-24 01:23:36 +02:00
Matthew Honnibal	001546c19e	Set version to v3.0.0a10	2020-08-23 21:15:38 +02:00
Matthew Honnibal	160a855246	Format	2020-08-23 21:15:12 +02:00
Matthew Honnibal	89f5b8abb3	Fix project push	2020-08-23 21:14:44 +02:00
Matthew Honnibal	3828bc3ed0	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-08-23 18:32:24 +02:00
Matthew Honnibal	e559867605	Allow spacy project to push and pull to/from remote storage (#5949 ) * Add utils for working with remote storage * WIP add remote_cache for project * WIP add push and pull commands * Use pathy in remote_cache * Updarte util * Update remote_cache * Update util * Update project assets * Update pull script * Update push script * Fix type annotation in util * Work on remote storage * Remove site and env hash * Fix imports * Fix type annotation * Require pathy * Require pathy * Fix import * Add a util to handle project variable substitution * Import push and pull commands * Fix pull command * Fix push command * Fix tarfile in remote_storage * Improve printing * Fiddle with status messages * Set version to v3.0.0a9 * Draft docs for spacy project remote storages * Update docs [ci skip] * Use Thinc config to simplify and unify template variables * Auto-format * Don't import Pathy globally for now Causes slow and annoying Google Cloud warning * Tidy up test * Tidy up and update tests * Update to latest Thinc * Update docs * variables -> vars * Update docs [ci skip] * Update docs [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2020-08-23 18:32:09 +02:00
Matthew Honnibal	fe1cf7e124	Allow score_weights to list extra scores	2020-08-23 18:31:30 +02:00
Ines Montani	9bdc9e81f5	Fix error message [ci skip]	2020-08-23 12:14:02 +02:00
svlandeg	af36d77d01	fix typo in docstring	2020-08-21 15:56:03 +02:00
svlandeg	3060e4ae65	Merge remote-tracking branch 'upstream/develop' into feature/docs-docs-docs # Conflicts: # website/src/widgets/quickstart-training-generator.js	2020-08-21 15:16:30 +02:00
svlandeg	cc926267f8	small fixes	2020-08-21 15:05:40 +02:00
Ines Montani	aa6a7cd6e7	Update docs and consistency [ci skip]	2020-08-21 13:49:18 +02:00
Ines Montani	3826cfb8fe	Merge pull request #5930 from svlandeg/feature/init-config-fix UX for init config	2020-08-21 12:06:33 +02:00
Ines Montani	79af7dcd6d	Small wording adjustments [ci skip]	2020-08-21 12:06:19 +02:00
Ines Montani	e60442d83a	Adjust label casing in displaCy NER visualizer (resolves #4866 ) - Accept any case for label names in ents and colors option, even if actual predicted label uses different casing - Don't text-transform: uppercase visually, if it's important to users that the label is represented as-is in the UI	2020-08-21 11:51:31 +02:00
Matthew Honnibal	c356e62908	Minor adjustments to quickstart template	2020-08-21 00:10:21 +02:00
Ines Montani	6ad59d59fe	Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip]	2020-08-20 11:20:58 +02:00
Ines Montani	ea6640ea72	Merge pull request #5939 from explosion/feature/thinc-v8.0.0a28 Update Thinc and config variables	2020-08-19 21:14:36 +02:00
Ines Montani	3dd390b1a1	Update Thinc and config variables	2020-08-19 19:46:12 +02:00
svlandeg	b96cd9fa5e	fix typo	2020-08-19 18:46:08 +02:00
Ines Montani	e2f2ef3a5a	Update init config and recommendations - As much as I dislike YAML, it seemed like a better format here because it allows us to add comments if we want to explain the different recommendations - Don't include the generated JS in the repo by default and build it on the fly when running or deploying the site. This ensures it's always up to date. - Simplify jinja_to_js script and use fewer dependencies	2020-08-19 13:33:15 +02:00
Ines Montani	2285e59765	Merge pull request #5933 from svlandeg/feature/more-v3-docs [ci skip]	2020-08-19 11:29:02 +02:00
Matthew Honnibal	c0f6e77a41	Set version to v3.0.0a8	2020-08-18 23:29:00 +02:00
svlandeg	a8acedd4ba	example of custom reader and batcher	2020-08-18 19:15:16 +02:00
Sofie Van Landeghem	358cbb21e3	Define candidate generator in EL config (#5876 ) * candidate generator as separate part of EL config * update comment * ent instead of str as input for candidate generation * Span instead of str: correct type indication * fix types * unit test to create new candidate generator * fix replace_pipe argument passing * move error message, general cleanup * add vocab back to KB constructor * provide KB as callable from Vocab arg * rename to kb_loader, fix KB serialization as part of the EL pipe * fix typo * reformatting * cleanup * fix comment * fix wrongly duplicated code from merge conflict * rename dump to to_disk * from_disk instead of load_bulk * update test after recent removal of set_morphology in tagger * remove old doc	2020-08-18 16:10:36 +02:00
Sofie Van Landeghem	688e77562b	Train CLI script fixes (#5931 ) * fix dash replacement in overrides arguments * perform interpolation on training config * make sure only .spacy files are read	2020-08-18 16:06:37 +02:00
Ines Montani	82f0e20318	Update docs and consistency [ci skip]	2020-08-18 14:39:40 +02:00
svlandeg	10e67b400c	output_file required, spacy-transformers prefered instead of required	2020-08-18 13:38:43 +02:00
Ines Montani	1c3bcfb488	Update docs and util consistency	2020-08-18 01:22:59 +02:00
Ines Montani	990c6b4c32	Update docs and CLI [ci skip]	2020-08-17 21:38:20 +02:00
Ines Montani	3ae5e02f4f	Update docs, types and API consistency	2020-08-17 16:45:24 +02:00
Matthew Honnibal	a95a36ce2a	Set version to v3.0.0a7	2020-08-16 15:51:05 +02:00
Ines Montani	6ae83bde0c	Fix CLI consistency [ci skip]	2020-08-16 15:46:29 +02:00
Ines Montani	45f13cbf64	Merge pull request #5916 from explosion/feature/new-thinc-config	2020-08-16 15:24:12 +02:00
Ines Montani	34bda91695	Show warnings if there's nothing to auto-fill	2020-08-16 14:19:43 +02:00
Ines Montani	dd5804d499	Update type hints	2020-08-16 14:19:33 +02:00
Ines Montani	a570c304df	Update quickstart, template and docs	2020-08-15 14:50:29 +02:00
Ines Montani	3272a63430	Merge pull request #5920 from explosion/fix/logging-warning-various	2020-08-15 14:41:15 +02:00
Ines Montani	fdcde9b0bf	Add init fill-config	2020-08-14 16:49:26 +02:00
Matthew Honnibal	9ebf39fb5f	Relax test	2020-08-14 16:31:09 +02:00
Ines Montani	8128e5eb35	Replace lexeme_norm warning with logging	2020-08-14 15:00:52 +02:00
Ines Montani	37814b608d	Remove env_opt and simplfy default Optimizer	2020-08-14 14:59:54 +02:00
Ines Montani	ab1d165bba	Pass optimizer defined in config to resume/begin_training Otherwise, this would create a default optimizer, which isn't what we want?	2020-08-14 14:59:22 +02:00
Ines Montani	e4d0990857	Only receive from listener if listener exists	2020-08-14 14:58:48 +02:00
Ines Montani	cef97e4b63	Fix path check	2020-08-14 14:58:18 +02:00
Ines Montani	db2dbc8e59	Remove unused warning	2020-08-14 14:58:03 +02:00
Ines Montani	67cc39af7f	Update Thinc and include section order	2020-08-14 14:06:22 +02:00
Ines Montani	88b0a96801	Update for new Thinc and adjust config	2020-08-13 17:38:30 +02:00
graue70	ba84371ab0	Use init parameter (#5909 )	2020-08-11 23:41:58 +02:00
Ines Montani	950832f087	Tidy up pipes (#5906 ) * Tidy up pipes * Fix init, defaults and raise custom errors * Update docs * Update docs [ci skip] * Apply suggestions from code review Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> * Tidy up error handling and validation, fix consistency * Simplify get_examples check * Remove unused import [ci skip] Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-11 23:29:31 +02:00
Ines Montani	f79e4c094d	Remove generic type Seems to cause error on Python 3.8 with Cython?	2020-08-10 17:24:30 +02:00
Ines Montani	c099f6eece	Add Token.lex	2020-08-10 16:43:52 +02:00
Ines Montani	933a7cf8d1	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
Ines Montani	64f2f84098	Update docstrings and docs [ci skip]	2020-08-10 13:45:22 +02:00
Ines Montani	a4b448eec4	Remove unused compiler flag	2020-08-10 13:13:18 +02:00
Ines Montani	3eaeb73342	Tidy up and auto-format	2020-08-09 22:36:23 +02:00
Ines Montani	d5c78c7a34	Update docs and fix consistency	2020-08-09 22:31:52 +02:00
Ines Montani	7c6854d8d4	Fix missing imports	2020-08-09 22:28:29 +02:00
Matthew Honnibal	0fc13b2f14	Set version to v3.0.0a6	2020-08-09 21:53:32 +02:00
Ines Montani	a15c5fb191	Update docstrings and docs	2020-08-09 16:10:48 +02:00
Ines Montani	8d2baa153d	Update tokenizer docs and add test	2020-08-09 15:24:01 +02:00
Matthew Honnibal	134d933d67	Add docstring for entity linker factory	2020-08-09 15:19:28 +02:00
Matthew Honnibal	992ee1c02f	Update tagger docstring	2020-08-09 15:09:31 +02:00
Matthew Honnibal	ebf9a7acbf	Add textcat docstring	2020-08-09 15:07:09 +02:00
Matthew Honnibal	8a13f510d6	Update tests	2020-08-09 15:01:16 +02:00
Matthew Honnibal	bbd8acd4bf	Add docstrings for parser and NER. Simplify some arguments	2020-08-09 14:46:13 +02:00
Matthew Honnibal	39a3d64c01	Add docstrings for Tok2Vec component	2020-08-09 00:48:03 +02:00
Ines Montani	fd20f84927	Merge pull request #5895 from explosion/docs/batchers Draft docstrings for batchers	2020-08-07 20:07:10 +02:00
Matthew Honnibal	f5c4e0b751	Add docstrings for batchers	2020-08-07 18:51:02 +02:00
Ines Montani	fe29ceec9e	Merge branch 'develop' into docs/model-docstrings	2020-08-07 18:42:01 +02:00
Ines Montani	3a193eb8f1	Fix imports, types and default configs	2020-08-07 18:40:54 +02:00
Matthew Honnibal	b1d83fc13e	Fix imports	2020-08-07 16:55:54 +02:00
Matthew Honnibal	473504d837	Format	2020-08-07 16:49:00 +02:00
Matthew Honnibal	234c52a91e	Add tok2vec docstrings	2020-08-07 16:48:48 +02:00
Matthew Honnibal	547bc8a82b	Add docstring notes	2020-08-07 16:17:34 +02:00
Ines Montani	6f3649923c	Merge pull request #5893 from explosion/feature/validate-arg	2020-08-07 15:47:20 +02:00
Adriane Boyd	e962784531	Add Lemmatizer and simplify related components (#5848 ) * Add Lemmatizer and simplify related components * Add `Lemmatizer` pipe with `lookup` and `rule` modes using the `Lookups` tables. * Reduce `Tagger` to a simple tagger that sets `Token.tag` (no pos or lemma) * Reduce `Morphology` to only keep track of morph tags (no tag map, lemmatizer, or morph rules) * Remove lemmatizer from `Vocab` * Adjust many many tests Differences: * No default lookup lemmas * No special treatment of TAG in `from_array` and similar required * Easier to modify labels in a `Tagger` * No extra strings added from morphology / tag map * Fix test * Initial fix for Lemmatizer config/serialization * Adjust init test to be more generic * Adjust init test to force empty Lookups * Add simple cache to rule-based lemmatizer * Convert language-specific lemmatizers Convert language-specific lemmatizers to component lemmatizers. Remove previous lemmatizer class. * Fix French and Polish lemmatizers * Remove outdated UPOS conversions * Update Russian lemmatizer init in tests * Add minimal init/run tests for custom lemmatizers * Add option to overwrite existing lemmas * Update mode setting, lookup loading, and caching * Make `mode` an immutable property * Only enforce strict `load_lookups` for known supported modes * Move caching into individual `_lemmatize` methods * Implement strict when lang is not found in lookups * Fix tables/lookups in make_lemmatizer * Reallow provided lookups and allow for stricter checks * Add lookups asset to all Lemmatizer pipe tests * Rename lookups in lemmatizer init test * Clean up merge * Refactor lookup table loading * Add helper from `load_lemmatizer_lookups` that loads required and optional lookups tables based on settings provided by a config. Additional slight refactor of lookups: * Add `Lookups.set_table` to set a table from a provided `Table` * Reorder class definitions to be able to specify type as `Table` * Move registry assets into test methods * Refactor lookups tables config Use class methods within `Lemmatizer` to provide the config for particular modes and to load the lookups from a config. * Add pipe and score to lemmatizer * Simplify Tagger.score * Add missing import * Clean up imports and auto-format * Remove unused kwarg * Tidy up and auto-format * Update docstrings for Lemmatizer Update docstrings for Lemmatizer. Additionally modify `is_base_form` API to take `Token` instead of individual features. * Update docstrings * Remove tag map values from Tagger.add_label * Update API docs * Fix relative link in Lemmatizer API docs	2020-08-07 15:27:13 +02:00
Matthew Honnibal	da6e59519e	Add docstrings for simple_ner	2020-08-07 15:09:49 +02:00
Matthew Honnibal	7ef8a64df9	Add docstring for parser	2020-08-07 14:59:34 +02:00
Ines Montani	fc9a4fe827	Update attribute ruler	2020-08-07 14:43:55 +02:00
Ines Montani	a8404c3517	validation -> validate	2020-08-07 14:43:47 +02:00
Ines Montani	1d01d89b79	Update CLI docs and evaluate command [ci skip]	2020-08-07 14:40:58 +02:00
Ines Montani	ef2c67cca5	Add DocBin to/from_disk methods and update docs (#5892 ) * Add DocBin to/from_disk methods and update docs * Use DocBin.from_disk in Corpus	2020-08-07 14:30:59 +02:00
Ines Montani	4ca08c6d5d	Merge pull request #5891 from adrianeboyd/docs/attribute-ruler-api Add AttributeRuler API docs	2020-08-07 13:55:12 +02:00
Adriane Boyd	b8d0c23857	Add AttributeRuler API docs With additional minor updates to AttributeRuler docstrings.	2020-08-07 12:43:23 +02:00
svlandeg	b17db0e994	Merge remote-tracking branch 'upstream/develop' into feature/el-docs # Conflicts: # website/docs/usage/training.md	2020-08-06 19:48:52 +02:00
Adriane Boyd	06c3a5e048	Add pipe to AttributeRuler (#5889 )	2020-08-06 19:43:09 +02:00
Ines Montani	9b7f198390	Fix format	2020-08-06 19:30:53 +02:00
Ines Montani	3c4389110d	Remove unused imports	2020-08-06 19:30:47 +02:00
Matthew Honnibal	d4525816ef	Be less choosy about reporting textcat scores (#5879 ) * Set textcat scores more consistently * Refactor textcat scores * Fixes to scorer * Add comments * Add threshold * Rename just 'f' to micro_f in textcat scorer * Fix textcat score for two-class * Fix syntax * Fix textcat score * Fix docstring	2020-08-06 16:24:13 +02:00
svlandeg	0b4d1e1bc4	'debug data' instead of 'debug-data'	2020-08-06 15:47:31 +02:00
svlandeg	881e3f8fd0	add docbin explanation and example	2020-08-06 15:29:44 +02:00
Adriane Boyd	5e683a6e46	Fix return values for per feat score (#5885 ) * Fix return values for per feat score Convert `PRFScore` to dict as other per type scores. * Update tests accordingly	2020-08-06 15:14:47 +02:00
Ines Montani	913d21f0a3	Merge pull request #5882 from explosion/feature/raise-from Use "raise ... from" in custom errors for better tracebacks	2020-08-06 00:35:26 +02:00

1 2 3 4 5 ...

7568 Commits