spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-01-11 03:01:25 +03:00

Author	SHA1	Message	Date
Adriane Boyd	a6edbcb7d2	Fix default name	2020-07-30 09:38:58 +02:00
Adriane Boyd	63f5951f8b	Add AttributeRuler for token attribute exceptions Add the `AttributeRuler` to handle exceptions for token-level attributes. The `AttributeRuler` uses `Matcher` patterns to identify target spans and applies the specified attributes to the token at the provided index in the matched span. A negative index can be used to index from the end of the matched span. The retokenizer is used to "merge" the individual tokens and assign them the provided attributes. Helper functions can import existing tag maps and morph rules to the corresponding `Matcher` patterns. There is an additional minor bug fix for `MORPH` attributes in the retokenizer to correctly normalize the values and to handle `MORPH` alongside `_` in an attrs dict.	2020-07-30 09:10:59 +02:00
Ines Montani	256b24b720	Update arch docs WIP [ci skip]	2020-07-28 20:33:52 +02:00
Ines Montani	2c7a32cf12	Remove unused methods	2020-07-28 16:50:02 +02:00
Ines Montani	ba22111ff4	Move error to Errors	2020-07-28 16:24:14 +02:00
Ines Montani	2748249217	Re-add meta["pipeline"] for now	2020-07-28 16:14:23 +02:00
Ines Montani	b83ead5bf5	Merge pull request #5824 from svlandeg/fix/textcat-v3	2020-07-28 15:04:25 +02:00
Ines Montani	06a97a8766	Support --opt=value format in CLI config overrides	2020-07-28 13:43:15 +02:00
Ines Montani	ae4d8a6ffd	Update docstrings, docs and pipe consistency	2020-07-28 13:37:31 +02:00
Ines Montani	0094cb0d04	Remove scores list from config and document	2020-07-28 11:22:24 +02:00
Ines Montani	9b704c3db3	Merge pull request #5819 from explosion/feature/component-scores	2020-07-28 10:40:56 +02:00
Ines Montani	2f83848b1f	Fix title [ci skip]	2020-07-27 18:25:38 +02:00
Ines Montani	894e20c466	Merge branch 'develop' into feature/component-scores	2020-07-27 18:14:39 +02:00
Ines Montani	d8b519c23c	API docs, docstrings and argument consistency	2020-07-27 18:11:45 +02:00
svlandeg	85b2dcfd67	cleanup	2020-07-27 17:54:44 +02:00
svlandeg	8353ca5a51	remove printing of config	2020-07-27 17:53:36 +02:00
svlandeg	61068e0fb1	util function dot_to_object and corresponding unit test	2020-07-27 17:50:12 +02:00
Ines Montani	10b84e1e27	Add flag to toggle sdist creation on package [ci skip]	2020-07-27 16:52:23 +02:00
svlandeg	674c39bff9	fix train_textcat script	2020-07-27 16:48:21 +02:00
Adriane Boyd	fdf09cb231	Update Scorer API docs for score_cats	2020-07-27 15:34:42 +02:00
Adriane Boyd	34c92dfe63	Add missing Scorer imports	2020-07-27 15:08:51 +02:00
Adriane Boyd	8bb0507777	Add and update score methods and score weights Add and update `score` methods, provided `scores`, and default weights `default_score_weights` for pipeline components. * `scores` provides all top-level keys returned by `score` (merely informative, similar to `assigns`). * `default_score_weights` provides the default weights for a default config. * The keys from `default_score_weights` determine which values will be shown in the `spacy train` output, so keys with weight `0.0` will be displayed but not counted toward the overall score.	2020-07-27 14:44:53 +02:00
Adriane Boyd	baf19fd652	Update cats scoring to provide overall score * Provide top-level score as `attr_score` * Provide a description of the score as `attr_score_desc` * Provide all potential scores keys, setting unused keys to `None` * Update CLI evaluate accordingly	2020-07-27 12:26:10 +02:00
Adriane Boyd	f8cf378be9	Combine weights from multiple components Combine weights from multiple components for the same score.	2020-07-27 10:21:31 +02:00
Ines Montani	7dd53d0964	Fix typo [ci skip]	2020-07-27 00:34:00 +02:00
Ines Montani	7adbaf9a5b	Update docs [ci skip]	2020-07-27 00:29:45 +02:00
Ines Montani	3d56a3f286	Make more args keyword-only	2020-07-27 00:27:53 +02:00
Matthew Honnibal	80271ac0ba	Update default config	2020-07-26 15:27:39 +02:00
Ines Montani	ed61fb10fc	Rename default textcat arch to TextCatEnsemble	2020-07-26 15:11:43 +02:00
Ines Montani	53d37da29a	Make sure @factories is removed from config	2020-07-26 15:11:24 +02:00
Matthew Honnibal	ac5901d076	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-07-26 14:20:27 +02:00
Matthew Honnibal	fb5dbe30b5	Trim training 101	2020-07-26 13:43:22 +02:00
Matthew Honnibal	e6a7deb7cc	Edits to the training 101 section	2020-07-26 13:42:08 +02:00
Ines Montani	4060c2d5a6	Fix test	2020-07-26 13:40:19 +02:00
Ines Montani	2470486543	Allow pipeline components to set default scores and weights	2020-07-26 13:18:43 +02:00
Ines Montani	787d066e22	Remove pipes.pyx Probably accidentally re-added in a merge?	2020-07-26 13:08:52 +02:00
Matthew Honnibal	520d25cb50	Add smart_open dependency to fetch project assets (#5812 ) * Use smart_open for project assets * Fix assets.py * Update pyproject.toml	2020-07-26 12:15:00 +02:00
Ines Montani	c288dba8e7	Update docs [ci skip]	2020-07-25 18:51:12 +02:00
Ines Montani	1346ee06d4	Merge pull request #5813 from explosion/chore/tidy-autoformat-types Tidy up, autoformat, add types	2020-07-25 18:44:08 +02:00
Ines Montani	eb9acae34d	Merge pull request #5791 from adrianeboyd/docs/morphology	2020-07-25 15:10:21 +02:00
Ines Montani	e92df281ce	Tidy up, autoformat, add types	2020-07-25 15:01:15 +02:00
Matthew Honnibal	71242327b2	Set version to v3.0.0a5	2020-07-25 14:06:01 +02:00
Matthew Honnibal	afd504f8c0	Update config	2020-07-25 14:04:25 +02:00
Ines Montani	cdbd6ba912	Merge pull request #5798 from explosion/feature/language-data-config	2020-07-25 13:34:49 +02:00
Matthew Honnibal	44a0b072e0	Merge branch 'feature/language-data-config' of https://github.com/explosion/spaCy into feature/language-data-config	2020-07-25 13:34:07 +02:00
Matthew Honnibal	17f39eebdc	Update PTB config	2020-07-25 13:33:40 +02:00
Ines Montani	49f27a2a7b	Tidy up [ci skip]	2020-07-25 13:00:49 +02:00
Ines Montani	4a0a692875	Add missing lex_attr_getters (resolves #5806 )	2020-07-25 12:55:18 +02:00
Adriane Boyd	2bcceb80c4	Refactor the Scorer to improve flexibility (#5731 ) * Refactor the Scorer to improve flexibility Refactor the `Scorer` to improve flexibility for arbitrary pipeline components. * Individual pipeline components provide their own `evaluate` methods that score a list of `Example`s and return a dictionary of scores * `Scorer` is initialized either: * with a provided pipeline containing components to be scored * with a default pipeline containing the built-in statistical components (senter, tagger, morphologizer, parser, ner) * `Scorer.score` evaluates a list of `Example`s and returns a dictionary of scores referring to the scores provided by the components in the pipeline Significant differences: * `tags_acc` is renamed to `tag_acc` to be consistent with `token_acc` and the new `morph_acc`, `pos_acc`, and `lemma_acc` * Scoring is no longer cumulative: `Scorer.score` scores a list of examples rather than a single example and does not retain any state about previously scored examples * PRF values in the returned scores are no longer multiplied by 100 * Add kwargs to Morphologizer.evaluate * Create generalized scoring methods in Scorer * Generalized static scoring methods are added to `Scorer` * Methods require an attribute (either on Token or Doc) that is used to key the returned scores Naming differences: * `uas`, `las`, and `las_per_type` in the scores dict are renamed to `dep_uas`, `dep_las`, and `dep_las_per_type` Scoring differences: * `Doc.sents` is now scored as spans rather than on sentence-initial token positions so that `Doc.sents` and `Doc.ents` can be scored with the same method (this lowers scores since a single incorrect sentence start results in two incorrect spans) * Simplify / extend hasattr check for eval method * Add hasattr check to tokenizer scoring * Simplify to hasattr check for component scoring * Reset Example alignment if docs are set Reset the Example alignment if either doc is set in case the tokenization has changed. * Add PRF tokenization scoring for tokens as spans Add PRF scores for tokens as character spans. The scores are: * token_acc: # correct tokens / # gold tokens * token_p/r/f: PRF for (token.idx, token.idx + len(token)) * Add docstring to Scorer.score_tokenization * Rename component.evaluate() to component.score() * Update Scorer API docs * Update scoring for positive_label in textcat * Fix TextCategorizer.score kwargs * Update Language.evaluate docs * Update score names in default config	2020-07-25 12:53:02 +02:00
Ines Montani	c003d26b94	Tidy up	2020-07-25 12:21:37 +02:00

1 2 3 4 5 ...

12355 Commits