Commit Graph

73 Commits

Author SHA1 Message Date
Matthew Honnibal
6a9d14e35a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-10-05 14:17:41 +02:00
Matthew Honnibal
d2b9aafb8c Fix augmenter 2020-10-05 14:14:49 +02:00
svlandeg
fd2d48556c fix E902 and E903 numbering 2020-10-05 13:43:32 +02:00
Ines Montani
3c36a57e84
Update data augmenters (#6196)
* Draft lower-case augmenter

* Make warning a debug log

* Update lowercase augmenter, docs and tests

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-04 17:46:29 +02:00
Matthew Honnibal
84ae197dd6 Fix logger 2020-10-04 14:16:53 +02:00
Ines Montani
bcd52e5486 Tidy up errors and warnings 2020-10-04 11:16:31 +02:00
Ines Montani
ff914f4e6f Lazy-load xx 2020-10-04 11:10:26 +02:00
Matthew Honnibal
85ede32680 Format 2020-10-03 19:26:23 +02:00
Matthew Honnibal
b305f2ff5a Fix loggers 2020-10-03 19:26:10 +02:00
Ines Montani
3bc3c05fcc Tidy up and auto-format 2020-10-03 17:20:18 +02:00
Ines Montani
dd542ec6a4
Fix label initialization of textcat component (#6190) 2020-10-03 17:07:38 +02:00
Ines Montani
989a96308f Tidy up, auto-format, types 2020-10-03 16:31:58 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging (#6184)
* Make logging and progress easier to control

* Update docs

* Cleanup errors

* Fix ConfigValidationError

* Pass stdout/stderr, not wasabi.Printer

* Fix type

* Upd logging example

* Fix logger example

* Fix type
2020-10-03 14:57:46 +02:00
Ines Montani
01c1538c72 Integrate file readers 2020-10-02 01:36:06 +02:00
Adriane Boyd
86c3ec9c2b
Refactor Token morph setting (#6175)
* Refactor Token morph setting

* Remove `Token.morph_`
* Add `Token.set_morph()`
  * `0` resets `token.c.morph` to unset
  * Any other values are passed to `Morphology.add`

* Add token.morph setter to set from MorphAnalysis
2020-10-01 22:21:46 +02:00
Ines Montani
f2627157c8 Update docs [ci skip] 2020-10-01 17:38:17 +02:00
Adriane Boyd
27cbffff1b
Minor edit to CoNLL-U converter (#6172)
This doesn't make a difference given how the `merged_morph` values
override the `morph` values for all the final docs, but could have led
to unexpected bugs in the future if the converter is modified.
2020-10-01 16:23:42 +02:00
Adriane Boyd
df98d3ef9f
Update import from collections.abc (#6174) 2020-10-01 16:21:49 +02:00
Ines Montani
44160cd52f Tidy up [ci skip] 2020-10-01 10:41:19 +02:00
Ines Montani
a103ab5f1a Update augmenter lookups and docs 2020-09-30 23:03:47 +02:00
Matthew Honnibal
c379a4274a Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-09-30 16:52:42 +02:00
Matthew Honnibal
e58dca3028 Add read_labels 2020-09-30 16:52:27 +02:00
Ines Montani
fe3f111c37
Merge pull request #6168 from explosion/fix/default-corpus-values 2020-09-30 00:24:02 +02:00
Matthew Honnibal
f52249fe2e Fix data augmentation 2020-09-29 23:40:54 +02:00
Matthew Honnibal
14c4da547f Try to fix augmentation 2020-09-29 23:08:56 +02:00
Ines Montani
df8dd91b6f Merge branch 'develop' into fix/default-corpus-values 2020-09-29 22:55:39 +02:00
Ines Montani
ad6d40d028 Add logging 2020-09-29 22:53:14 +02:00
Ines Montani
1aeef3bfbb Make corpus paths default to None and improve errors 2020-09-29 22:33:46 +02:00
Ines Montani
fa47f87924 Tidy up and auto-format 2020-09-29 21:39:28 +02:00
Ines Montani
d3c63b7965 Merge branch 'develop' into feature/prepare 2020-09-29 20:53:05 +02:00
Ines Montani
2be80379ec Fix small issues, resolve_dot_names and debug model 2020-09-29 20:38:35 +02:00
Ines Montani
fd594cfb9b Tighten up format 2020-09-29 16:47:55 +02:00
Ines Montani
978ab54a84 Fix logging 2020-09-29 16:22:41 +02:00
Ines Montani
aa2a6882d0 Fix logging 2020-09-29 16:08:39 +02:00
Ines Montani
63d1598137 Simplify config use in Language.initialize 2020-09-29 16:05:48 +02:00
Ines Montani
612bbf85ab Update initialize.py 2020-09-29 12:14:47 +02:00
Ines Montani
42f0e4c946 Clean up 2020-09-29 12:14:08 +02:00
Ines Montani
78396d137f Integrate initialize settings 2020-09-29 11:57:08 +02:00
Ines Montani
4925ad760a Add init vectors 2020-09-29 10:58:50 +02:00
Ines Montani
ff9a63bfbd begin_training -> initialize 2020-09-28 21:35:09 +02:00
Ines Montani
046f655d86 Fix error 2020-09-28 21:17:45 +02:00
Ines Montani
a139fe672b Fix typos and refactor CLI logging 2020-09-28 21:17:10 +02:00
Ines Montani
2e9c9e74af Fix config resolution and interpolation
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
2020-09-28 15:34:00 +02:00
Ines Montani
822ea4ef61 Refactor CLI 2020-09-28 15:09:59 +02:00
Ines Montani
d5155376fd Update vocab init 2020-09-28 11:30:18 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus (#6155)
* Support data augmentation in Corpus

* Note initial docs for data augmentation

* Add augmenter to quickstart

* Fix flake8

* Format

* Fix test

* Update spacy/tests/training/test_training.py

* Improve data augmentation arguments

* Update templates

* Move randomization out into caller

* Refactor

* Update spacy/training/augment.py

* Update spacy/tests/training/test_training.py

* Fix augment

* Fix test
2020-09-28 03:03:27 +02:00
Matthew Honnibal
13b1605ee6 Add init script 2020-09-28 01:08:49 +02:00
Matthew Honnibal
26afd3bd90 Fix iteration order 2020-09-25 21:47:22 +02:00
Matthew Honnibal
3d8388969e Sort paths for cache consistency 2020-09-25 19:07:26 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script (#6143)
* update pretraining API in train CLI

* bump thinc to 8.0.0a35

* bump to 3.0.0a26

* doc fixes

* small doc fix
2020-09-25 15:47:10 +02:00