Commit Graph

12359 Commits

Author SHA1 Message Date
Adriane Boyd
8f8a5c3386 Fix index boundaries in Span 2020-07-31 14:18:55 +02:00
Adriane Boyd
4f0843e0ec Sort spans before processing 2020-07-30 14:45:43 +02:00
Adriane Boyd
ca33e891e2 Extend AttributeRuler functionality
* Add option to initialize with a dict of AttributeRuler patterns

* Instead of silently discarding overlapping matches (the default
behavior for the retokenizer if only the attrs differ), split the
matches into disjoint sets and retokenize each set separately. This
allows, for instance, one pattern to set the POS and another pattern to
set the lemma. (If two matches modify the same attribute, it looks like
the attrs are applied in the order they were added, but it may not be
deterministic?)

* Improve types
2020-07-30 11:17:33 +02:00
Adriane Boyd
352b918356 Update name in error message 2020-07-30 10:23:23 +02:00
Adriane Boyd
a6edbcb7d2 Fix default name 2020-07-30 09:38:58 +02:00
Adriane Boyd
63f5951f8b Add AttributeRuler for token attribute exceptions
Add the `AttributeRuler` to handle exceptions for token-level
attributes. The `AttributeRuler` uses `Matcher` patterns to identify
target spans and applies the specified attributes to the token at the
provided index in the matched span. A negative index can be used to
index from the end of the matched span. The retokenizer is used to
"merge" the individual tokens and assign them the provided attributes.

Helper functions can import existing tag maps and morph rules to the
corresponding `Matcher` patterns.

There is an additional minor bug fix for `MORPH` attributes in the
retokenizer to correctly normalize the values and to handle `MORPH`
alongside `_` in an attrs dict.
2020-07-30 09:10:59 +02:00
Ines Montani
256b24b720 Update arch docs WIP [ci skip] 2020-07-28 20:33:52 +02:00
Ines Montani
2c7a32cf12 Remove unused methods 2020-07-28 16:50:02 +02:00
Ines Montani
ba22111ff4 Move error to Errors 2020-07-28 16:24:14 +02:00
Ines Montani
2748249217 Re-add meta["pipeline"] for now 2020-07-28 16:14:23 +02:00
Ines Montani
b83ead5bf5
Merge pull request #5824 from svlandeg/fix/textcat-v3 2020-07-28 15:04:25 +02:00
Ines Montani
06a97a8766 Support --opt=value format in CLI config overrides 2020-07-28 13:43:15 +02:00
Ines Montani
ae4d8a6ffd Update docstrings, docs and pipe consistency 2020-07-28 13:37:31 +02:00
Ines Montani
0094cb0d04 Remove scores list from config and document 2020-07-28 11:22:24 +02:00
Ines Montani
9b704c3db3
Merge pull request #5819 from explosion/feature/component-scores 2020-07-28 10:40:56 +02:00
Ines Montani
2f83848b1f Fix title [ci skip] 2020-07-27 18:25:38 +02:00
Ines Montani
894e20c466 Merge branch 'develop' into feature/component-scores 2020-07-27 18:14:39 +02:00
Ines Montani
d8b519c23c API docs, docstrings and argument consistency 2020-07-27 18:11:45 +02:00
svlandeg
85b2dcfd67 cleanup 2020-07-27 17:54:44 +02:00
svlandeg
8353ca5a51 remove printing of config 2020-07-27 17:53:36 +02:00
svlandeg
61068e0fb1 util function dot_to_object and corresponding unit test 2020-07-27 17:50:12 +02:00
Ines Montani
10b84e1e27 Add flag to toggle sdist creation on package [ci skip] 2020-07-27 16:52:23 +02:00
svlandeg
674c39bff9 fix train_textcat script 2020-07-27 16:48:21 +02:00
Adriane Boyd
fdf09cb231 Update Scorer API docs for score_cats 2020-07-27 15:34:42 +02:00
Adriane Boyd
34c92dfe63 Add missing Scorer imports 2020-07-27 15:08:51 +02:00
Adriane Boyd
8bb0507777 Add and update score methods and score weights
Add and update `score` methods, provided `scores`, and default weights
`default_score_weights` for pipeline components.

* `scores` provides all top-level keys returned by `score` (merely informative, similar to `assigns`).
* `default_score_weights` provides the default weights for a default config.
* The keys from `default_score_weights` determine which values will be
shown in the `spacy train` output, so keys with weight `0.0` will be
displayed but not counted toward the overall score.
2020-07-27 14:44:53 +02:00
Adriane Boyd
baf19fd652 Update cats scoring to provide overall score
* Provide top-level score as `attr_score`
* Provide a description of the score as `attr_score_desc`
* Provide all potential scores keys, setting unused keys to `None`
* Update CLI evaluate accordingly
2020-07-27 12:26:10 +02:00
Adriane Boyd
f8cf378be9 Combine weights from multiple components
Combine weights from multiple components for the same score.
2020-07-27 10:21:31 +02:00
Ines Montani
7dd53d0964 Fix typo [ci skip] 2020-07-27 00:34:00 +02:00
Ines Montani
7adbaf9a5b Update docs [ci skip] 2020-07-27 00:29:45 +02:00
Ines Montani
3d56a3f286 Make more args keyword-only 2020-07-27 00:27:53 +02:00
Matthew Honnibal
80271ac0ba Update default config 2020-07-26 15:27:39 +02:00
Ines Montani
ed61fb10fc Rename default textcat arch to TextCatEnsemble 2020-07-26 15:11:43 +02:00
Ines Montani
53d37da29a Make sure @factories is removed from config 2020-07-26 15:11:24 +02:00
Matthew Honnibal
ac5901d076 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-26 14:20:27 +02:00
Matthew Honnibal
fb5dbe30b5 Trim training 101 2020-07-26 13:43:22 +02:00
Matthew Honnibal
e6a7deb7cc Edits to the training 101 section 2020-07-26 13:42:08 +02:00
Ines Montani
4060c2d5a6 Fix test 2020-07-26 13:40:19 +02:00
Ines Montani
2470486543 Allow pipeline components to set default scores and weights 2020-07-26 13:18:43 +02:00
Ines Montani
787d066e22 Remove pipes.pyx
Probably accidentally re-added in a merge?
2020-07-26 13:08:52 +02:00
Matthew Honnibal
520d25cb50
Add smart_open dependency to fetch project assets (#5812)
* Use smart_open for project assets

* Fix assets.py

* Update pyproject.toml
2020-07-26 12:15:00 +02:00
Ines Montani
c288dba8e7 Update docs [ci skip] 2020-07-25 18:51:12 +02:00
Ines Montani
1346ee06d4
Merge pull request #5813 from explosion/chore/tidy-autoformat-types
Tidy up, autoformat, add types
2020-07-25 18:44:08 +02:00
Ines Montani
eb9acae34d
Merge pull request #5791 from adrianeboyd/docs/morphology 2020-07-25 15:10:21 +02:00
Ines Montani
e92df281ce Tidy up, autoformat, add types 2020-07-25 15:01:15 +02:00
Matthew Honnibal
71242327b2 Set version to v3.0.0a5 2020-07-25 14:06:01 +02:00
Matthew Honnibal
afd504f8c0 Update config 2020-07-25 14:04:25 +02:00
Ines Montani
cdbd6ba912
Merge pull request #5798 from explosion/feature/language-data-config 2020-07-25 13:34:49 +02:00
Matthew Honnibal
44a0b072e0 Merge branch 'feature/language-data-config' of https://github.com/explosion/spaCy into feature/language-data-config 2020-07-25 13:34:07 +02:00
Matthew Honnibal
17f39eebdc Update PTB config 2020-07-25 13:33:40 +02:00