Commit Graph

1068 Commits

Author SHA1 Message Date
Ines Montani
2c9804038d Fix success message [ci skip] 2020-10-23 16:11:54 +02:00
Adriane Boyd
563a21834e Save raw scores in evaluate output 2020-10-19 15:49:09 +02:00
Adriane Boyd
dd207ca6d0 Add dep_las_per_type and more generic PRF printer 2020-10-19 15:49:02 +02:00
Adriane Boyd
4300858ecb Include per-type/feat scores in evaluate output 2020-10-19 15:48:55 +02:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes (#6263)
* small fix in example imports

* throw error when train_corpus or dev_corpus is not a string

* small fix in custom logger example

* limit macro_auc to labels with 2 annotations

* fix typo

* also create parents of output_dir if need be

* update documentation of textcat scores

* refactor TextCatEnsemble

* fix tests for new AUC definition

* bump to 3.0.0a42

* update docs

* rename to spacy.TextCatEnsemble.v2

* spacy.TextCatEnsemble.v1 in legacy

* cleanup

* small fix

* update to 3.0.0rc2

* fix import that got lost in merge

* cursed IDE

* fix two typos
2020-10-18 14:50:41 +02:00
Adriane Boyd
c8d04b79e2 Sort and add vectors for langs without transformers 2020-10-16 08:25:16 +02:00
Adriane Boyd
2fbd43c603 Use core lg models as vectors models in quickstart 2020-10-16 08:17:53 +02:00
Ines Montani
1f49300862 Update transformer recommendations [ci skip] 2020-10-13 15:41:17 +02:00
svlandeg
e972ecba72 add utf8 encoding for opening file 2020-10-09 16:03:14 +02:00
Sofie Van Landeghem
241cd112f5
add reenabled pipe names back to the meta before serializing (#6219) 2020-10-08 00:44:16 +02:00
svlandeg
9b4cf7b0b6 update output of debug config command 2020-10-06 09:47:23 +02:00
Ines Montani
181039bd17
Merge pull request #6205 from explosion/feature/embed-features 2020-10-05 21:49:10 +02:00
Matthew Honnibal
b7e01d2024 Fix quickstart 2020-10-05 21:21:30 +02:00
Matthew Honnibal
ff8b980775 Upd quickstart template 2020-10-05 21:19:41 +02:00
Ines Montani
0135f6ed95 Enable commit check via env var 2020-10-05 20:51:15 +02:00
Ines Montani
d58fb42707 Add spacy_version option and validation for project.yml 2020-10-05 20:00:42 +02:00
Ines Montani
84fedcebab
Make args keyword-only [ci skip]
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-05 17:07:35 +02:00
Ines Montani
6958510bda Include spaCy version check in project CLI 2020-10-05 13:53:07 +02:00
Ines Montani
bcd52e5486 Tidy up errors and warnings 2020-10-04 11:16:31 +02:00
Ines Montani
3bc3c05fcc Tidy up and auto-format 2020-10-03 17:20:18 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Adriane Boyd
22158dc24a Add morphologizer to quickstart template 2020-10-02 15:06:16 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Ines Montani
7f68f4bd92 Hide jsonl_loc on init vectors and tidy up [ci skip] 2020-10-01 16:44:17 +02:00
Ines Montani
0a8a124a6e Update docs [ci skip] 2020-10-01 12:15:53 +02:00
Ines Montani
44160cd52f Tidy up [ci skip] 2020-10-01 10:41:19 +02:00
Matthew Honnibal
59294e91aa Restore the 'jsonl' arg for init vectors
The lexemes.jsonl file is still used in our English vectors, and it may
be required by users as well. I think it's worth supporting the option.
2020-09-30 19:06:50 +02:00
Ines Montani
23c63eefaf Tidy up env vars [ci skip] 2020-09-30 15:15:11 +02:00
Elijah Rippeth
4cbb954281
reorder so tagmap is replaced only if a custom file is provided. (#6164)
* reorder so tagmap is replaced only if a custom file is provided.

* Remove unneeded variable initialization

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-09-30 13:26:06 +02:00
Ines Montani
a5debb356d Tidy up and adjust logging [ci skip] 2020-09-30 01:22:08 +02:00
Ines Montani
56a2f778c4 Add logging [ci skip] 2020-09-30 01:08:55 +02:00
Ines Montani
fe3f111c37
Merge pull request #6168 from explosion/fix/default-corpus-values 2020-09-30 00:24:02 +02:00
Ines Montani
ae51843468 Remove augmenter from jinja template [ci skip] 2020-09-29 23:08:50 +02:00
Ines Montani
9bb958fd0a Fix debug data [ci skip] 2020-09-29 23:07:11 +02:00
Ines Montani
df8dd91b6f Merge branch 'develop' into fix/default-corpus-values 2020-09-29 22:55:39 +02:00
Ines Montani
0a1ee109db Remove init form path 2020-09-29 22:53:18 +02:00
Ines Montani
c334a7d45f Remove 2020-09-29 22:38:39 +02:00
Ines Montani
1aeef3bfbb Make corpus paths default to None and improve errors 2020-09-29 22:33:46 +02:00
Ines Montani
0250bcf6a3 Show validation error during init 2020-09-29 22:29:09 +02:00
Ines Montani
43c92ec8c9 Resolve dir for better output [ci skip] 2020-09-29 22:01:04 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani
604be54a5c Support --code in evaluate CLI [ci skip] 2020-09-29 21:20:56 +02:00
Ines Montani
d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani
2be80379ec Fix small issues, resolve_dot_names and debug model 2020-09-29 20:38:35 +02:00
Ines Montani
71a0ee274a Move init labels to init pipeline module 2020-09-29 18:09:33 +02:00
Ines Montani
534e1ef498 Fix template 2020-09-29 17:02:55 +02:00
Matthew Honnibal
10847c7f4e Fix arg 2020-09-29 16:48:07 +02:00
Matthew Honnibal
e70a00fa76 Remove unnecessary warning from train 2020-09-29 16:47:54 +02:00
Matthew Honnibal
3f0d61232d Remove outdated arg from train 2020-09-29 16:47:44 +02:00
Matthew Honnibal
e957d66b92 Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare 2020-09-29 16:22:53 +02:00
Matthew Honnibal
45daf5c9fe Add init labels command 2020-09-29 16:22:37 +02:00
Ines Montani
aa2a6882d0 Fix logging 2020-09-29 16:08:39 +02:00
Sofie Van Landeghem
6a04e5adea
encoding UTF8 (#6161) 2020-09-29 14:49:55 +02:00
Ines Montani
4925ad760a Add init vectors 2020-09-29 10:58:50 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani
a139fe672b Fix typos and refactor CLI logging 2020-09-28 21:17:10 +02:00
Ines Montani
2e9c9e74af Fix config resolution and interpolation
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
2020-09-28 15:34:00 +02:00
Ines Montani
822ea4ef61 Refactor CLI 2020-09-28 15:09:59 +02:00
Ines Montani
a89e0ff7cb Fix typo 2020-09-28 12:55:21 +02:00
Ines Montani
a62337b3f3 Tidy up vocab init 2020-09-28 12:53:06 +02:00
Ines Montani
c22ecc66bb Don't support init path for now 2020-09-28 12:46:28 +02:00
Ines Montani
a5f2cc0509 Tidy up and remove raw text (rehearsal) for now 2020-09-28 12:30:13 +02:00
Ines Montani
1590de11b1 Update config 2020-09-28 12:05:23 +02:00
Ines Montani
e44a7519cd Update CLI and add [initialize] block 2020-09-28 11:56:14 +02:00
Ines Montani
d5155376fd Update vocab init 2020-09-28 11:30:18 +02:00
Ines Montani
8b74fd19df init pipeline -> init nlp 2020-09-28 11:13:38 +02:00
Ines Montani
2fdb7285a0 Update CLI 2020-09-28 11:06:07 +02:00
Ines Montani
553bfea641 Fix commands 2020-09-28 10:53:17 +02:00
Matthew Honnibal
44bad1474c Add init_pipeline file 2020-09-28 09:47:34 +02:00
Matthew Honnibal
b886f53c31 init-pipeline runs (maybe doesnt work) 2020-09-28 03:42:47 +02:00
Matthew Honnibal
ed2aff2db3 Remove unused train code 2020-09-28 03:12:31 +02:00
Matthew Honnibal
3a0a3b8db6 Dont hard-code for 'corpora' name 2020-09-28 03:06:33 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Matthew Honnibal
a3e1791c9c Upd train 2020-09-28 01:08:30 +02:00
Matthew Honnibal
b5556093e2 Start updating train script 2020-09-27 23:59:44 +02:00
Ines Montani
e04bd16f7f Merge branch 'develop' into feature/new-thinc-config-resolution 2020-09-27 22:34:46 +02:00
Ines Montani
d7ad65a9bb Fix handling of error description [ci skip] 2020-09-27 22:31:57 +02:00
Ines Montani
7e938ed63e Update config resolution to use new Thinc 2020-09-27 22:21:31 +02:00
Matthew Honnibal
39b178999c Tmp notes 2020-09-27 20:13:38 +02:00
Ines Montani
b4486d747d Merge branch 'develop' into fix/train-config-interpolation 2020-09-26 15:32:14 +02:00
Ines Montani
b2d07de786 Construct nlp from uninterpolated config before training 2020-09-26 15:16:59 +02:00
Ines Montani
ca3c997062 Improve CLI config validation with latest Thinc 2020-09-26 13:13:57 +02:00
Matthew Honnibal
3d8388969e Sort paths for cache consistency 2020-09-25 19:07:26 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00
Matthew Honnibal
74ee456374 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-24 16:11:47 +02:00
Matthew Honnibal
0bc214c102 Fix pull 2020-09-24 16:11:33 +02:00
Ines Montani
74e1f192b4
Merge pull request #6134 from explosion/feature/training_before_to_disk 2020-09-24 14:44:11 +02:00
Ines Montani
24e7ac3f2b Fix download CLI [ci skip] 2020-09-24 14:43:56 +02:00
Ines Montani
88e54caa12 accuracy -> performance 2020-09-24 14:32:35 +02:00
Ines Montani
be56c0994b Add [training.before_to_disk] callback 2020-09-24 12:40:25 +02:00
Ines Montani
c6c67b606e
Merge pull request #6133 from explosion/fix/score_weights 2020-09-24 12:00:57 +02:00
Ines Montani
f69fea8b25 Improve error handling around non-number scores 2020-09-24 11:29:07 +02:00
Matthew Honnibal
17a6b0a173
Make project pull order insensitive (#6131) 2020-09-24 10:30:42 +02:00
Ines Montani
ae51f580c1 Fix handling of score_weights 2020-09-24 10:27:33 +02:00
svlandeg
35dbc63578 Merge remote-tracking branch 'upstream/develop' into fix/nr_features
# Conflicts:
#	spacy/ml/models/parser.py
#	spacy/tests/serialize/test_serialize_config.py
#	website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
dd2292793f 'parser' instead of 'deps' for state_type 2020-09-23 16:53:49 +02:00
svlandeg
6c85fab316 state_type and extra_state_tokens instead of nr_feature_tokens 2020-09-23 13:35:09 +02:00
Ines Montani
7745d77a38 Fix whitespace in template [ci skip] 2020-09-23 13:21:42 +02:00
svlandeg
6435458d51 simplify expression 2020-09-23 12:12:38 +02:00
svlandeg
20b0ec5dcf avoid logging performance of frozen components 2020-09-23 10:37:12 +02:00