Adriane Boyd
fa8fa474a3
Add nlp.batch_size setting
...
Add a default `batch_size` setting for `Language.pipe` and
`Language.evaluate` as `nlp.batch_size`.
2020-12-09 09:13:26 +01:00
Ines Montani
5d605d539d
Remove output_file from init_config helper
2020-12-09 10:57:55 +11:00
svlandeg
8f8a7f1733
returning config in init_config
2020-12-08 17:37:20 +01:00
Ines Montani
6c7a930ee8
Fix variable
2020-12-08 20:44:59 +11:00
Ines Montani
94a5a9814f
Update argument handling and documentation
2020-12-08 20:41:18 +11:00
Ines Montani
d25b1606d6
Allow reading config from sdtin in spacy train
2020-12-08 18:01:40 +11:00
Adriane Boyd
78085fab1f
Check for spacy-nightly package in download ( #6502 )
...
Also check for spacy-nightly in download so that `--no-deps` isn't set
for normal nightly installs.
2020-12-04 09:40:03 +01:00
Adriane Boyd
591cd48aa8
Remove config.cfg from MANIFEST
2020-12-01 12:58:02 +01:00
Adriane Boyd
b0dd13e0ba
Support LICENSE in spacy package
...
If present, include the file `input_dir/LICENSE` at the top level of the
packaged model.
2020-11-30 13:43:58 +01:00
Ines Montani
9beba7164f
Make jinja2 top-level import
...
No problem anymore since it's now an official dependency
2020-11-27 15:17:14 +08:00
Sofie Van Landeghem
a0c899a0ff
Fix textcat + transformer architecture ( #6371 )
...
* add pooling to textcat TransformerListener
* maybe_get_dim in case it's null
2020-11-10 20:14:47 +08:00
Adriane Boyd
a4b32b9552
Handle missing reference values in scorer ( #6286 )
...
* Handle missing reference values in scorer
Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.
Attributes without unset states:
* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation
Additional changes:
* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`
* Fix import
* Update return types
2020-11-03 15:47:18 +01:00
Ines Montani
2c9804038d
Fix success message [ci skip]
2020-10-23 16:11:54 +02:00
Adriane Boyd
563a21834e
Save raw scores in evaluate output
2020-10-19 15:49:09 +02:00
Adriane Boyd
dd207ca6d0
Add dep_las_per_type and more generic PRF printer
2020-10-19 15:49:02 +02:00
Adriane Boyd
4300858ecb
Include per-type/feat scores in evaluate output
2020-10-19 15:48:55 +02:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes ( #6263 )
...
* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos
2020-10-18 14:50:41 +02:00
Adriane Boyd
c8d04b79e2
Sort and add vectors for langs without transformers
2020-10-16 08:25:16 +02:00
Adriane Boyd
2fbd43c603
Use core lg models as vectors models in quickstart
2020-10-16 08:17:53 +02:00
Ines Montani
1f49300862
Update transformer recommendations [ci skip]
2020-10-13 15:41:17 +02:00
svlandeg
e972ecba72
add utf8 encoding for opening file
2020-10-09 16:03:14 +02:00
svlandeg
9b4cf7b0b6
update output of debug config command
2020-10-06 09:47:23 +02:00
Ines Montani
181039bd17
Merge pull request #6205 from explosion/feature/embed-features
2020-10-05 21:49:10 +02:00
Matthew Honnibal
b7e01d2024
Fix quickstart
2020-10-05 21:21:30 +02:00
Matthew Honnibal
ff8b980775
Upd quickstart template
2020-10-05 21:19:41 +02:00
Ines Montani
0135f6ed95
Enable commit check via env var
2020-10-05 20:51:15 +02:00
Ines Montani
d58fb42707
Add spacy_version option and validation for project.yml
2020-10-05 20:00:42 +02:00
Ines Montani
84fedcebab
Make args keyword-only [ci skip]
...
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-05 17:07:35 +02:00
Ines Montani
6958510bda
Include spaCy version check in project CLI
2020-10-05 13:53:07 +02:00
Ines Montani
bcd52e5486
Tidy up errors and warnings
2020-10-04 11:16:31 +02:00
Ines Montani
3bc3c05fcc
Tidy up and auto-format
2020-10-03 17:20:18 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging ( #6184 )
...
* Make logging and progress easier to control
* Update docs
* Cleanup errors
* Fix ConfigValidationError
* Pass stdout/stderr, not wasabi.Printer
* Fix type
* Upd logging example
* Fix logger example
* Fix type
2020-10-03 14:57:46 +02:00
Adriane Boyd
22158dc24a
Add morphologizer to quickstart template
2020-10-02 15:06:16 +02:00
Ines Montani
f2627157c8
Update docs [ci skip]
2020-10-01 17:38:17 +02:00
Ines Montani
7f68f4bd92
Hide jsonl_loc on init vectors and tidy up [ci skip]
2020-10-01 16:44:17 +02:00
Ines Montani
0a8a124a6e
Update docs [ci skip]
2020-10-01 12:15:53 +02:00
Ines Montani
44160cd52f
Tidy up [ci skip]
2020-10-01 10:41:19 +02:00
Matthew Honnibal
59294e91aa
Restore the 'jsonl' arg for init vectors
...
The lexemes.jsonl file is still used in our English vectors, and it may
be required by users as well. I think it's worth supporting the option.
2020-09-30 19:06:50 +02:00
Ines Montani
23c63eefaf
Tidy up env vars [ci skip]
2020-09-30 15:15:11 +02:00
Ines Montani
a5debb356d
Tidy up and adjust logging [ci skip]
2020-09-30 01:22:08 +02:00
Ines Montani
56a2f778c4
Add logging [ci skip]
2020-09-30 01:08:55 +02:00
Ines Montani
fe3f111c37
Merge pull request #6168 from explosion/fix/default-corpus-values
2020-09-30 00:24:02 +02:00
Ines Montani
ae51843468
Remove augmenter from jinja template [ci skip]
2020-09-29 23:08:50 +02:00
Ines Montani
9bb958fd0a
Fix debug data [ci skip]
2020-09-29 23:07:11 +02:00
Ines Montani
df8dd91b6f
Merge branch 'develop' into fix/default-corpus-values
2020-09-29 22:55:39 +02:00
Ines Montani
0a1ee109db
Remove init form path
2020-09-29 22:53:18 +02:00
Ines Montani
c334a7d45f
Remove
2020-09-29 22:38:39 +02:00
Ines Montani
1aeef3bfbb
Make corpus paths default to None and improve errors
2020-09-29 22:33:46 +02:00
Ines Montani
0250bcf6a3
Show validation error during init
2020-09-29 22:29:09 +02:00
Ines Montani
43c92ec8c9
Resolve dir for better output [ci skip]
2020-09-29 22:01:04 +02:00