spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-10-04 02:46:40 +03:00

Author	SHA1	Message	Date
Ines Montani	8128e5eb35	Replace lexeme_norm warning with logging	2020-08-14 15:00:52 +02:00
Ines Montani	37814b608d	Remove env_opt and simplfy default Optimizer	2020-08-14 14:59:54 +02:00
Ines Montani	ab1d165bba	Pass optimizer defined in config to resume/begin_training Otherwise, this would create a default optimizer, which isn't what we want?	2020-08-14 14:59:22 +02:00
Ines Montani	e4d0990857	Only receive from listener if listener exists	2020-08-14 14:58:48 +02:00
Ines Montani	cef97e4b63	Fix path check	2020-08-14 14:58:18 +02:00
Ines Montani	db2dbc8e59	Remove unused warning	2020-08-14 14:58:03 +02:00
graue70	ba84371ab0	Use init parameter (#5909 )	2020-08-11 23:41:58 +02:00
Ines Montani	950832f087	Tidy up pipes (#5906 ) * Tidy up pipes * Fix init, defaults and raise custom errors * Update docs * Update docs [ci skip] * Apply suggestions from code review Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> * Tidy up error handling and validation, fix consistency * Simplify get_examples check * Remove unused import [ci skip] Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-08-11 23:29:31 +02:00
Ines Montani	f79e4c094d	Remove generic type Seems to cause error on Python 3.8 with Cython?	2020-08-10 17:24:30 +02:00
Ines Montani	c099f6eece	Add Token.lex	2020-08-10 16:43:52 +02:00
Ines Montani	933a7cf8d1	Fix Lexeme.from_ptr	2020-08-10 16:43:37 +02:00
Ines Montani	64f2f84098	Update docstrings and docs [ci skip]	2020-08-10 13:45:22 +02:00
Ines Montani	a4b448eec4	Remove unused compiler flag	2020-08-10 13:13:18 +02:00
Ines Montani	3eaeb73342	Tidy up and auto-format	2020-08-09 22:36:23 +02:00
Ines Montani	d5c78c7a34	Update docs and fix consistency	2020-08-09 22:31:52 +02:00
Ines Montani	7c6854d8d4	Fix missing imports	2020-08-09 22:28:29 +02:00
Matthew Honnibal	0fc13b2f14	Set version to v3.0.0a6	2020-08-09 21:53:32 +02:00
Ines Montani	a15c5fb191	Update docstrings and docs	2020-08-09 16:10:48 +02:00
Ines Montani	8d2baa153d	Update tokenizer docs and add test	2020-08-09 15:24:01 +02:00
Matthew Honnibal	134d933d67	Add docstring for entity linker factory	2020-08-09 15:19:28 +02:00
Matthew Honnibal	992ee1c02f	Update tagger docstring	2020-08-09 15:09:31 +02:00
Matthew Honnibal	ebf9a7acbf	Add textcat docstring	2020-08-09 15:07:09 +02:00
Matthew Honnibal	8a13f510d6	Update tests	2020-08-09 15:01:16 +02:00
Matthew Honnibal	bbd8acd4bf	Add docstrings for parser and NER. Simplify some arguments	2020-08-09 14:46:13 +02:00
Matthew Honnibal	39a3d64c01	Add docstrings for Tok2Vec component	2020-08-09 00:48:03 +02:00
Ines Montani	fd20f84927	Merge pull request #5895 from explosion/docs/batchers Draft docstrings for batchers	2020-08-07 20:07:10 +02:00
Matthew Honnibal	f5c4e0b751	Add docstrings for batchers	2020-08-07 18:51:02 +02:00
Ines Montani	fe29ceec9e	Merge branch 'develop' into docs/model-docstrings	2020-08-07 18:42:01 +02:00
Ines Montani	3a193eb8f1	Fix imports, types and default configs	2020-08-07 18:40:54 +02:00
Matthew Honnibal	b1d83fc13e	Fix imports	2020-08-07 16:55:54 +02:00
Matthew Honnibal	473504d837	Format	2020-08-07 16:49:00 +02:00
Matthew Honnibal	234c52a91e	Add tok2vec docstrings	2020-08-07 16:48:48 +02:00
Matthew Honnibal	547bc8a82b	Add docstring notes	2020-08-07 16:17:34 +02:00
Ines Montani	6f3649923c	Merge pull request #5893 from explosion/feature/validate-arg	2020-08-07 15:47:20 +02:00
Adriane Boyd	e962784531	Add Lemmatizer and simplify related components (#5848 ) * Add Lemmatizer and simplify related components * Add `Lemmatizer` pipe with `lookup` and `rule` modes using the `Lookups` tables. * Reduce `Tagger` to a simple tagger that sets `Token.tag` (no pos or lemma) * Reduce `Morphology` to only keep track of morph tags (no tag map, lemmatizer, or morph rules) * Remove lemmatizer from `Vocab` * Adjust many many tests Differences: * No default lookup lemmas * No special treatment of TAG in `from_array` and similar required * Easier to modify labels in a `Tagger` * No extra strings added from morphology / tag map * Fix test * Initial fix for Lemmatizer config/serialization * Adjust init test to be more generic * Adjust init test to force empty Lookups * Add simple cache to rule-based lemmatizer * Convert language-specific lemmatizers Convert language-specific lemmatizers to component lemmatizers. Remove previous lemmatizer class. * Fix French and Polish lemmatizers * Remove outdated UPOS conversions * Update Russian lemmatizer init in tests * Add minimal init/run tests for custom lemmatizers * Add option to overwrite existing lemmas * Update mode setting, lookup loading, and caching * Make `mode` an immutable property * Only enforce strict `load_lookups` for known supported modes * Move caching into individual `_lemmatize` methods * Implement strict when lang is not found in lookups * Fix tables/lookups in make_lemmatizer * Reallow provided lookups and allow for stricter checks * Add lookups asset to all Lemmatizer pipe tests * Rename lookups in lemmatizer init test * Clean up merge * Refactor lookup table loading * Add helper from `load_lemmatizer_lookups` that loads required and optional lookups tables based on settings provided by a config. Additional slight refactor of lookups: * Add `Lookups.set_table` to set a table from a provided `Table` * Reorder class definitions to be able to specify type as `Table` * Move registry assets into test methods * Refactor lookups tables config Use class methods within `Lemmatizer` to provide the config for particular modes and to load the lookups from a config. * Add pipe and score to lemmatizer * Simplify Tagger.score * Add missing import * Clean up imports and auto-format * Remove unused kwarg * Tidy up and auto-format * Update docstrings for Lemmatizer Update docstrings for Lemmatizer. Additionally modify `is_base_form` API to take `Token` instead of individual features. * Update docstrings * Remove tag map values from Tagger.add_label * Update API docs * Fix relative link in Lemmatizer API docs	2020-08-07 15:27:13 +02:00
Matthew Honnibal	da6e59519e	Add docstrings for simple_ner	2020-08-07 15:09:49 +02:00
Matthew Honnibal	7ef8a64df9	Add docstring for parser	2020-08-07 14:59:34 +02:00
Ines Montani	fc9a4fe827	Update attribute ruler	2020-08-07 14:43:55 +02:00
Ines Montani	a8404c3517	validation -> validate	2020-08-07 14:43:47 +02:00
Ines Montani	1d01d89b79	Update CLI docs and evaluate command [ci skip]	2020-08-07 14:40:58 +02:00
Ines Montani	ef2c67cca5	Add DocBin to/from_disk methods and update docs (#5892 ) * Add DocBin to/from_disk methods and update docs * Use DocBin.from_disk in Corpus	2020-08-07 14:30:59 +02:00
Ines Montani	4ca08c6d5d	Merge pull request #5891 from adrianeboyd/docs/attribute-ruler-api Add AttributeRuler API docs	2020-08-07 13:55:12 +02:00
Adriane Boyd	b8d0c23857	Add AttributeRuler API docs With additional minor updates to AttributeRuler docstrings.	2020-08-07 12:43:23 +02:00
svlandeg	b17db0e994	Merge remote-tracking branch 'upstream/develop' into feature/el-docs # Conflicts: # website/docs/usage/training.md	2020-08-06 19:48:52 +02:00
Adriane Boyd	06c3a5e048	Add pipe to AttributeRuler (#5889 )	2020-08-06 19:43:09 +02:00
Ines Montani	9b7f198390	Fix format	2020-08-06 19:30:53 +02:00
Ines Montani	3c4389110d	Remove unused imports	2020-08-06 19:30:47 +02:00
Matthew Honnibal	d4525816ef	Be less choosy about reporting textcat scores (#5879 ) * Set textcat scores more consistently * Refactor textcat scores * Fixes to scorer * Add comments * Add threshold * Rename just 'f' to micro_f in textcat scorer * Fix textcat score for two-class * Fix syntax * Fix textcat score * Fix docstring	2020-08-06 16:24:13 +02:00
svlandeg	0b4d1e1bc4	'debug data' instead of 'debug-data'	2020-08-06 15:47:31 +02:00
svlandeg	881e3f8fd0	add docbin explanation and example	2020-08-06 15:29:44 +02:00

1 2 3 4 5 ...

7470 Commits