Commit Graph

1012 Commits

Author SHA1 Message Date
Ines Montani
9beba7164f Make jinja2 top-level import
No problem anymore since it's now an official dependency
2020-11-27 15:17:14 +08:00
Sofie Van Landeghem
a0c899a0ff
Fix textcat + transformer architecture (#6371)
* add pooling to textcat TransformerListener

* maybe_get_dim in case it's null
2020-11-10 20:14:47 +08:00
Adriane Boyd
a4b32b9552
Handle missing reference values in scorer (#6286)
* Handle missing reference values in scorer

Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.

Attributes without unset states:

* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation

Additional changes:

* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`

* Fix import

* Update return types
2020-11-03 15:47:18 +01:00
Ines Montani
2c9804038d Fix success message [ci skip] 2020-10-23 16:11:54 +02:00
Adriane Boyd
563a21834e Save raw scores in evaluate output 2020-10-19 15:49:09 +02:00
Adriane Boyd
dd207ca6d0 Add dep_las_per_type and more generic PRF printer 2020-10-19 15:49:02 +02:00
Adriane Boyd
4300858ecb Include per-type/feat scores in evaluate output 2020-10-19 15:48:55 +02:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Adriane Boyd
c8d04b79e2 Sort and add vectors for langs without transformers 2020-10-16 08:25:16 +02:00
Adriane Boyd
2fbd43c603 Use core lg models as vectors models in quickstart 2020-10-16 08:17:53 +02:00
Ines Montani
1f49300862 Update transformer recommendations [ci skip] 2020-10-13 15:41:17 +02:00
svlandeg
e972ecba72 add utf8 encoding for opening file 2020-10-09 16:03:14 +02:00
svlandeg
9b4cf7b0b6 update output of debug config command 2020-10-06 09:47:23 +02:00
Ines Montani
181039bd17
Merge pull request #6205 from explosion/feature/embed-features 2020-10-05 21:49:10 +02:00
Matthew Honnibal
b7e01d2024 Fix quickstart 2020-10-05 21:21:30 +02:00
Matthew Honnibal
ff8b980775 Upd quickstart template 2020-10-05 21:19:41 +02:00
Ines Montani
0135f6ed95 Enable commit check via env var 2020-10-05 20:51:15 +02:00
Ines Montani
d58fb42707 Add spacy_version option and validation for project.yml 2020-10-05 20:00:42 +02:00
Ines Montani
84fedcebab
Make args keyword-only [ci skip]
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-05 17:07:35 +02:00
Ines Montani
6958510bda Include spaCy version check in project CLI 2020-10-05 13:53:07 +02:00
Ines Montani
bcd52e5486 Tidy up errors and warnings 2020-10-04 11:16:31 +02:00
Ines Montani
3bc3c05fcc Tidy up and auto-format 2020-10-03 17:20:18 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Adriane Boyd
22158dc24a Add morphologizer to quickstart template 2020-10-02 15:06:16 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Ines Montani
7f68f4bd92 Hide jsonl_loc on init vectors and tidy up [ci skip] 2020-10-01 16:44:17 +02:00
Ines Montani
0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani
44160cd52f Tidy up [ci skip] 2020-10-01 10:41:19 +02:00
Matthew Honnibal
59294e91aa Restore the 'jsonl' arg for init vectors
The lexemes.jsonl file is still used in our English vectors, and it may
be required by users as well. I think it's worth supporting the option.
2020-09-30 19:06:50 +02:00
Ines Montani
23c63eefaf Tidy up env vars [ci skip] 2020-09-30 15:15:11 +02:00
Ines Montani
a5debb356d Tidy up and adjust logging [ci skip] 2020-09-30 01:22:08 +02:00
Ines Montani
56a2f778c4 Add logging [ci skip] 2020-09-30 01:08:55 +02:00
Ines Montani
fe3f111c37
Merge pull request #6168 from explosion/fix/default-corpus-values 2020-09-30 00:24:02 +02:00
Ines Montani
ae51843468 Remove augmenter from jinja template [ci skip] 2020-09-29 23:08:50 +02:00
Ines Montani
9bb958fd0a Fix debug data [ci skip] 2020-09-29 23:07:11 +02:00
Ines Montani
df8dd91b6f Merge branch 'develop' into fix/default-corpus-values 2020-09-29 22:55:39 +02:00
Ines Montani
0a1ee109db Remove init form path 2020-09-29 22:53:18 +02:00
Ines Montani
c334a7d45f Remove 2020-09-29 22:38:39 +02:00
Ines Montani
1aeef3bfbb Make corpus paths default to None and improve errors 2020-09-29 22:33:46 +02:00
Ines Montani
0250bcf6a3 Show validation error during init 2020-09-29 22:29:09 +02:00
Ines Montani
43c92ec8c9 Resolve dir for better output [ci skip] 2020-09-29 22:01:04 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani
604be54a5c Support --code in evaluate CLI [ci skip] 2020-09-29 21:20:56 +02:00
Ines Montani
d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani
2be80379ec Fix small issues, resolve_dot_names and debug model 2020-09-29 20:38:35 +02:00
Ines Montani
71a0ee274a Move init labels to init pipeline module 2020-09-29 18:09:33 +02:00
Ines Montani
534e1ef498 Fix template 2020-09-29 17:02:55 +02:00
Matthew Honnibal
10847c7f4e Fix arg 2020-09-29 16:48:07 +02:00
Matthew Honnibal
e70a00fa76 Remove unnecessary warning from train 2020-09-29 16:47:54 +02:00
Matthew Honnibal
3f0d61232d Remove outdated arg from train 2020-09-29 16:47:44 +02:00