svlandeg
8f8a7f1733
returning config in init_config
2020-12-08 17:37:20 +01:00
Ines Montani
6c7a930ee8
Fix variable
2020-12-08 20:44:59 +11:00
Ines Montani
94a5a9814f
Update argument handling and documentation
2020-12-08 20:41:18 +11:00
Adriane Boyd
5ceac425ee
Remove non-working --use-chars from train CLI
...
Remove the non-working `--use-chars` option from the train CLI. The
implementation of the option across component types and the CLI settings
could be fixed, but the `CharacterEmbed` model does not work on GPU in
v2 so it's better to remove it.
2020-12-08 08:30:00 +01:00
Ines Montani
d25b1606d6
Allow reading config from sdtin in spacy train
2020-12-08 18:01:40 +11:00
Adriane Boyd
78085fab1f
Check for spacy-nightly package in download ( #6502 )
...
Also check for spacy-nightly in download so that `--no-deps` isn't set
for normal nightly installs.
2020-12-04 09:40:03 +01:00
Adriane Boyd
591cd48aa8
Remove config.cfg from MANIFEST
2020-12-01 12:58:02 +01:00
Adriane Boyd
b0dd13e0ba
Support LICENSE in spacy package
...
If present, include the file `input_dir/LICENSE` at the top level of the
packaged model.
2020-11-30 13:43:58 +01:00
Ines Montani
9beba7164f
Make jinja2 top-level import
...
No problem anymore since it's now an official dependency
2020-11-27 15:17:14 +08:00
Adriane Boyd
573f5c863f
Fix tag map clobbering in spacy train ( #6437 )
...
Fix bug from #5768 where the tag map is clobbered if a custom tag map
isn't provided.
2020-11-24 13:13:16 +01:00
Sofie Van Landeghem
a0c899a0ff
Fix textcat + transformer architecture ( #6371 )
...
* add pooling to textcat TransformerListener
* maybe_get_dim in case it's null
2020-11-10 20:14:47 +08:00
Adriane Boyd
a4b32b9552
Handle missing reference values in scorer ( #6286 )
...
* Handle missing reference values in scorer
Handle missing values in reference doc during scoring where it is
possible to detect an unset state for the attribute. If no reference
docs contain annotation, `None` is returned instead of a score. `spacy
evaluate` displays `-` for missing scores and the missing scores are
saved as `None`/`null` in the metrics.
Attributes without unset states:
* `token.head`: relies on `token.dep` to recognize unset values
* `doc.cats`: unable to handle missing annotation
Additional changes:
* add optional `has_annotation` check to `score_scans` to replace
`doc.sents` hack
* update `score_token_attr_per_feat` to handle missing and empty morph
representations
* fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START`
vs. `SENT_START`
* Fix import
* Update return types
2020-11-03 15:47:18 +01:00
Ines Montani
2c9804038d
Fix success message [ci skip]
2020-10-23 16:11:54 +02:00
Adriane Boyd
563a21834e
Save raw scores in evaluate output
2020-10-19 15:49:09 +02:00
Adriane Boyd
dd207ca6d0
Add dep_las_per_type and more generic PRF printer
2020-10-19 15:49:02 +02:00
Adriane Boyd
4300858ecb
Include per-type/feat scores in evaluate output
2020-10-19 15:48:55 +02:00
Sofie Van Landeghem
75a202ce65
TextCat updates and fixes ( #6263 )
...
* small fix in example imports
* throw error when train_corpus or dev_corpus is not a string
* small fix in custom logger example
* limit macro_auc to labels with 2 annotations
* fix typo
* also create parents of output_dir if need be
* update documentation of textcat scores
* refactor TextCatEnsemble
* fix tests for new AUC definition
* bump to 3.0.0a42
* update docs
* rename to spacy.TextCatEnsemble.v2
* spacy.TextCatEnsemble.v1 in legacy
* cleanup
* small fix
* update to 3.0.0rc2
* fix import that got lost in merge
* cursed IDE
* fix two typos
2020-10-18 14:50:41 +02:00
Adriane Boyd
c8d04b79e2
Sort and add vectors for langs without transformers
2020-10-16 08:25:16 +02:00
Adriane Boyd
2fbd43c603
Use core lg models as vectors models in quickstart
2020-10-16 08:17:53 +02:00
Ines Montani
1f49300862
Update transformer recommendations [ci skip]
2020-10-13 15:41:17 +02:00
svlandeg
e972ecba72
add utf8 encoding for opening file
2020-10-09 16:03:14 +02:00
Sofie Van Landeghem
241cd112f5
add reenabled pipe names back to the meta before serializing ( #6219 )
2020-10-08 00:44:16 +02:00
svlandeg
9b4cf7b0b6
update output of debug config command
2020-10-06 09:47:23 +02:00
Ines Montani
181039bd17
Merge pull request #6205 from explosion/feature/embed-features
2020-10-05 21:49:10 +02:00
Matthew Honnibal
b7e01d2024
Fix quickstart
2020-10-05 21:21:30 +02:00
Matthew Honnibal
ff8b980775
Upd quickstart template
2020-10-05 21:19:41 +02:00
Ines Montani
0135f6ed95
Enable commit check via env var
2020-10-05 20:51:15 +02:00
Ines Montani
d58fb42707
Add spacy_version option and validation for project.yml
2020-10-05 20:00:42 +02:00
Ines Montani
84fedcebab
Make args keyword-only [ci skip]
...
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-10-05 17:07:35 +02:00
Ines Montani
6958510bda
Include spaCy version check in project CLI
2020-10-05 13:53:07 +02:00
Ines Montani
bcd52e5486
Tidy up errors and warnings
2020-10-04 11:16:31 +02:00
Ines Montani
3bc3c05fcc
Tidy up and auto-format
2020-10-03 17:20:18 +02:00
Matthew Honnibal
db419f6b2f
Improve control of training progress and logging ( #6184 )
...
* Make logging and progress easier to control
* Update docs
* Cleanup errors
* Fix ConfigValidationError
* Pass stdout/stderr, not wasabi.Printer
* Fix type
* Upd logging example
* Fix logger example
* Fix type
2020-10-03 14:57:46 +02:00
Adriane Boyd
22158dc24a
Add morphologizer to quickstart template
2020-10-02 15:06:16 +02:00
Ines Montani
f2627157c8
Update docs [ci skip]
2020-10-01 17:38:17 +02:00
Ines Montani
7f68f4bd92
Hide jsonl_loc on init vectors and tidy up [ci skip]
2020-10-01 16:44:17 +02:00
Ines Montani
0a8a124a6e
Update docs [ci skip]
2020-10-01 12:15:53 +02:00
Ines Montani
44160cd52f
Tidy up [ci skip]
2020-10-01 10:41:19 +02:00
Matthew Honnibal
59294e91aa
Restore the 'jsonl' arg for init vectors
...
The lexemes.jsonl file is still used in our English vectors, and it may
be required by users as well. I think it's worth supporting the option.
2020-09-30 19:06:50 +02:00
Ines Montani
23c63eefaf
Tidy up env vars [ci skip]
2020-09-30 15:15:11 +02:00
Elijah Rippeth
4cbb954281
reorder so tagmap is replaced only if a custom file is provided. ( #6164 )
...
* reorder so tagmap is replaced only if a custom file is provided.
* Remove unneeded variable initialization
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-09-30 13:26:06 +02:00
Ines Montani
a5debb356d
Tidy up and adjust logging [ci skip]
2020-09-30 01:22:08 +02:00
Ines Montani
56a2f778c4
Add logging [ci skip]
2020-09-30 01:08:55 +02:00
Ines Montani
fe3f111c37
Merge pull request #6168 from explosion/fix/default-corpus-values
2020-09-30 00:24:02 +02:00
Ines Montani
ae51843468
Remove augmenter from jinja template [ci skip]
2020-09-29 23:08:50 +02:00
Ines Montani
9bb958fd0a
Fix debug data [ci skip]
2020-09-29 23:07:11 +02:00
Ines Montani
df8dd91b6f
Merge branch 'develop' into fix/default-corpus-values
2020-09-29 22:55:39 +02:00
Ines Montani
0a1ee109db
Remove init form path
2020-09-29 22:53:18 +02:00
Ines Montani
c334a7d45f
Remove
2020-09-29 22:38:39 +02:00
Ines Montani
1aeef3bfbb
Make corpus paths default to None and improve errors
2020-09-29 22:33:46 +02:00
Ines Montani
0250bcf6a3
Show validation error during init
2020-09-29 22:29:09 +02:00
Ines Montani
43c92ec8c9
Resolve dir for better output [ci skip]
2020-09-29 22:01:04 +02:00
Ines Montani
fa47f87924
Tidy up and auto-format
2020-09-29 21:39:28 +02:00
Ines Montani
604be54a5c
Support --code in evaluate CLI [ci skip]
2020-09-29 21:20:56 +02:00
Ines Montani
d3c63b7965
Merge branch 'develop' into feature/prepare
2020-09-29 20:53:05 +02:00
Ines Montani
2be80379ec
Fix small issues, resolve_dot_names and debug model
2020-09-29 20:38:35 +02:00
Ines Montani
71a0ee274a
Move init labels to init pipeline module
2020-09-29 18:09:33 +02:00
Ines Montani
534e1ef498
Fix template
2020-09-29 17:02:55 +02:00
Matthew Honnibal
10847c7f4e
Fix arg
2020-09-29 16:48:07 +02:00
Matthew Honnibal
e70a00fa76
Remove unnecessary warning from train
2020-09-29 16:47:54 +02:00
Matthew Honnibal
3f0d61232d
Remove outdated arg from train
2020-09-29 16:47:44 +02:00
Matthew Honnibal
e957d66b92
Merge branch 'feature/prepare' of https://github.com/explosion/spaCy into feature/prepare
2020-09-29 16:22:53 +02:00
Matthew Honnibal
45daf5c9fe
Add init labels command
2020-09-29 16:22:37 +02:00
Ines Montani
aa2a6882d0
Fix logging
2020-09-29 16:08:39 +02:00
Sofie Van Landeghem
6a04e5adea
encoding UTF8 ( #6161 )
2020-09-29 14:49:55 +02:00
Ines Montani
4925ad760a
Add init vectors
2020-09-29 10:58:50 +02:00
Ines Montani
ff9a63bfbd
begin_training -> initialize
2020-09-28 21:35:09 +02:00
Ines Montani
a139fe672b
Fix typos and refactor CLI logging
2020-09-28 21:17:10 +02:00
Ines Montani
2e9c9e74af
Fix config resolution and interpolation
...
TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)
2020-09-28 15:34:00 +02:00
Ines Montani
822ea4ef61
Refactor CLI
2020-09-28 15:09:59 +02:00
Ines Montani
a89e0ff7cb
Fix typo
2020-09-28 12:55:21 +02:00
Ines Montani
a62337b3f3
Tidy up vocab init
2020-09-28 12:53:06 +02:00
Ines Montani
c22ecc66bb
Don't support init path for now
2020-09-28 12:46:28 +02:00
Ines Montani
a5f2cc0509
Tidy up and remove raw text (rehearsal) for now
2020-09-28 12:30:13 +02:00
Ines Montani
1590de11b1
Update config
2020-09-28 12:05:23 +02:00
Ines Montani
e44a7519cd
Update CLI and add [initialize] block
2020-09-28 11:56:14 +02:00
Ines Montani
d5155376fd
Update vocab init
2020-09-28 11:30:18 +02:00
Ines Montani
8b74fd19df
init pipeline -> init nlp
2020-09-28 11:13:38 +02:00
Ines Montani
2fdb7285a0
Update CLI
2020-09-28 11:06:07 +02:00
Ines Montani
553bfea641
Fix commands
2020-09-28 10:53:17 +02:00
Matthew Honnibal
44bad1474c
Add init_pipeline file
2020-09-28 09:47:34 +02:00
Matthew Honnibal
b886f53c31
init-pipeline runs (maybe doesnt work)
2020-09-28 03:42:47 +02:00
Matthew Honnibal
ed2aff2db3
Remove unused train code
2020-09-28 03:12:31 +02:00
Matthew Honnibal
3a0a3b8db6
Dont hard-code for 'corpora' name
2020-09-28 03:06:33 +02:00
Matthew Honnibal
a976da168c
Support data augmentation in Corpus ( #6155 )
...
* Support data augmentation in Corpus
* Note initial docs for data augmentation
* Add augmenter to quickstart
* Fix flake8
* Format
* Fix test
* Update spacy/tests/training/test_training.py
* Improve data augmentation arguments
* Update templates
* Move randomization out into caller
* Refactor
* Update spacy/training/augment.py
* Update spacy/tests/training/test_training.py
* Fix augment
* Fix test
2020-09-28 03:03:27 +02:00
Matthew Honnibal
a3e1791c9c
Upd train
2020-09-28 01:08:30 +02:00
Matthew Honnibal
b5556093e2
Start updating train script
2020-09-27 23:59:44 +02:00
Ines Montani
e04bd16f7f
Merge branch 'develop' into feature/new-thinc-config-resolution
2020-09-27 22:34:46 +02:00
Ines Montani
d7ad65a9bb
Fix handling of error description [ci skip]
2020-09-27 22:31:57 +02:00
Ines Montani
7e938ed63e
Update config resolution to use new Thinc
2020-09-27 22:21:31 +02:00
Matthew Honnibal
39b178999c
Tmp notes
2020-09-27 20:13:38 +02:00
Ines Montani
b4486d747d
Merge branch 'develop' into fix/train-config-interpolation
2020-09-26 15:32:14 +02:00
Ines Montani
b2d07de786
Construct nlp from uninterpolated config before training
2020-09-26 15:16:59 +02:00
Ines Montani
ca3c997062
Improve CLI config validation with latest Thinc
2020-09-26 13:13:57 +02:00
Matthew Honnibal
3d8388969e
Sort paths for cache consistency
2020-09-25 19:07:26 +02:00
Sofie Van Landeghem
009ba14aaf
Fix pretraining in train script ( #6143 )
...
* update pretraining API in train CLI
* bump thinc to 8.0.0a35
* bump to 3.0.0a26
* doc fixes
* small doc fix
2020-09-25 15:47:10 +02:00
Matthew Honnibal
74ee456374
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-09-24 16:11:47 +02:00
Matthew Honnibal
0bc214c102
Fix pull
2020-09-24 16:11:33 +02:00
Ines Montani
74e1f192b4
Merge pull request #6134 from explosion/feature/training_before_to_disk
2020-09-24 14:44:11 +02:00
Ines Montani
24e7ac3f2b
Fix download CLI [ci skip]
2020-09-24 14:43:56 +02:00
Ines Montani
88e54caa12
accuracy -> performance
2020-09-24 14:32:35 +02:00
Ines Montani
be56c0994b
Add [training.before_to_disk] callback
2020-09-24 12:40:25 +02:00
Ines Montani
c6c67b606e
Merge pull request #6133 from explosion/fix/score_weights
2020-09-24 12:00:57 +02:00
Ines Montani
f69fea8b25
Improve error handling around non-number scores
2020-09-24 11:29:07 +02:00
Matthew Honnibal
17a6b0a173
Make project pull order insensitive ( #6131 )
2020-09-24 10:30:42 +02:00
Ines Montani
ae51f580c1
Fix handling of score_weights
2020-09-24 10:27:33 +02:00
svlandeg
35dbc63578
Merge remote-tracking branch 'upstream/develop' into fix/nr_features
...
# Conflicts:
# spacy/ml/models/parser.py
# spacy/tests/serialize/test_serialize_config.py
# website/docs/api/architectures.md
2020-09-23 17:01:13 +02:00
svlandeg
dd2292793f
'parser' instead of 'deps' for state_type
2020-09-23 16:53:49 +02:00
svlandeg
6c85fab316
state_type and extra_state_tokens instead of nr_feature_tokens
2020-09-23 13:35:09 +02:00
Ines Montani
7745d77a38
Fix whitespace in template [ci skip]
2020-09-23 13:21:42 +02:00
svlandeg
6435458d51
simplify expression
2020-09-23 12:12:38 +02:00
svlandeg
20b0ec5dcf
avoid logging performance of frozen components
2020-09-23 10:37:12 +02:00
Ines Montani
6ca06cb62c
Update docs and formatting [ci skip]
2020-09-23 10:14:27 +02:00
Ines Montani
888f936a73
Merge pull request #6106 from svlandeg/feature/textcat-quickstart
2020-09-23 10:11:45 +02:00
Ines Montani
60a317520a
Merge pull request #6109 from svlandeg/feature/2rename
2020-09-23 09:47:12 +02:00
svlandeg
556f3e4652
add pooling to NEL's TransformerListener
2020-09-23 09:24:28 +02:00
Sofie Van Landeghem
86a08f819d
tok2vec.update instead of predict ( #6113 )
2020-09-22 21:54:52 +02:00
Ines Montani
5e3b796b12
Validate section refs in debug config
2020-09-22 12:24:39 +02:00
svlandeg
085a1c8e2b
add no_output_layer to TextCatBOW config
2020-09-22 12:06:40 +02:00
svlandeg
b556a10808
rename converts in_to_out
2020-09-22 11:50:19 +02:00
svlandeg
e931f4d757
add textcat score
2020-09-22 10:56:43 +02:00
svlandeg
396b33257f
add entity_linker to jinja template
2020-09-22 10:40:05 +02:00
svlandeg
135de82a2d
add textcat to quickstart
2020-09-22 10:22:06 +02:00
Ines Montani
6316d5f398
Improve messages in project CLI [ci skip]
2020-09-22 09:45:34 +02:00
Ines Montani
81606b29bd
Merge pull request #6104 from svlandeg/fix/debug_model [ci skip]
2020-09-22 09:31:23 +02:00
svlandeg
45b29c4a5b
cleanup
2020-09-21 23:17:23 +02:00
svlandeg
fa5c416db6
initialize through nlp object and with train_corpus
2020-09-21 23:09:22 +02:00
svlandeg
447b3e5787
Merge remote-tracking branch 'upstream/develop' into fix/debug_model
...
# Conflicts:
# spacy/cli/debug_model.py
2020-09-21 16:58:40 +02:00
Ines Montani
e8bcaa44f1
Don't auto-decompress archives with smart_open [ci skip]
2020-09-21 16:01:46 +02:00
svlandeg
eb9b447960
Merge remote-tracking branch 'upstream/develop' into fix/debug_model
...
# Conflicts:
# spacy/cli/debug_model.py
2020-09-21 14:05:16 +02:00
Ines Montani
758ead8a47
Sync overrides with CLI overrides
2020-09-21 12:50:13 +02:00
Ines Montani
5497acf49a
Support config overrides via environment variables
2020-09-21 11:25:10 +02:00
Ines Montani
1114219ae3
Tidy up and auto-format
2020-09-21 10:59:07 +02:00
Ines Montani
b2302c0a1c
Improve error for missing dependency
2020-09-20 17:44:51 +02:00
Matthew Honnibal
8fb59d958c
Format
2020-09-20 16:31:48 +02:00
Matthew Honnibal
dc22771f87
Fix sparse checkout
2020-09-20 16:30:05 +02:00
Matthew Honnibal
a0fb5e50db
Use simple git clone call if not sparse
2020-09-20 16:22:04 +02:00
Matthew Honnibal
2c24d633d0
Use updated run_command
2020-09-20 16:21:43 +02:00
Ines Montani
554c9a2497
Update docs [ci skip]
2020-09-20 12:30:53 +02:00
svlandeg
6db1d5dc0d
trying some stuff
2020-09-19 19:11:30 +02:00
Ines Montani
e863b3dc14
Merge pull request #6092 from adrianeboyd/bugfix/load-vocab-lookups-2
2020-09-19 12:33:38 +02:00
Sofie Van Landeghem
39872de1f6
Introducing the gpu_allocator ( #6091 )
...
* rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator'
* --code instead of --code-path
* update documentation
* avoid querying the "system" section directly
* add explanation of gpu_allocator to TF/PyTorch section in docs
* fix typo
* fix typo 2
* use set_gpu_allocator from thinc 8.0.0a34
* default null instead of empty string
2020-09-19 01:17:02 +02:00
svlandeg
73ff52b9ec
hack for tok2vec listener
2020-09-18 16:43:15 +02:00
Adriane Boyd
eed4b785f5
Load vocab lookups tables at beginning of training
...
Similar to how vectors are handled, move the vocab lookups to be loaded
at the start of training rather than when the vocab is initialized,
since the vocab doesn't have access to the full config when it's
created.
The option moves from `nlp.load_vocab_data` to `training.lookups`.
Typically these tables will come from `spacy-lookups-data`, but any
`Lookups` object can be provided.
The loading from `spacy-lookups-data` is now strict, so configs for each
language should specify the exact tables required. This also makes it
easier to control whether the larger clusters and probs tables are
included.
To load `lexeme_norm` from `spacy-lookups-data`:
```
[training.lookups]
@misc = "spacy.LoadLookupsData.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
```
2020-09-18 15:59:16 +02:00
Ines Montani
a127fa475e
Merge pull request #6078 from svlandeg/fix/corpus
2020-09-18 14:44:21 +02:00
svlandeg
e4fc7e0222
fixing output sample to proper 2D array
2020-09-17 22:34:36 +02:00
Ines Montani
3865214343
Use consistent shortcut
2020-09-17 16:57:02 +02:00
svlandeg
35a3931064
fix typo
2020-09-17 16:36:27 +02:00
svlandeg
ddfc1fc146
add pretraining option to init config
2020-09-17 16:05:40 +02:00
svlandeg
427dbecdd6
cleanup and formatting
2020-09-17 11:48:04 +02:00
svlandeg
0c35885751
generalize corpora, dot notation for dev and train corpus
2020-09-17 11:38:59 +02:00
svlandeg
51fa929f47
rewrite train_corpus to corpus.train in config
2020-09-15 21:58:04 +02:00
Ines Montani
9cc304c194
Merge pull request #6064 from explosion/fix/sparse-checkout-ux
...
Fix sparse checkout and error handling
2020-09-15 00:32:20 +02:00
Sofie Van Landeghem
3216a33149
positive_label config for textcat ( #6062 )
...
* hook up positive_label in textcat
* unit tests
* documentation
* formatting
* tests
* fix typo
* move verify_config to after begin_training
* revert accidential commit
2020-09-14 17:08:00 +02:00
Ines Montani
c052017025
Fix sparse checkout and error handling
2020-09-14 14:12:58 +02:00
Matthew Honnibal
54c40223a1
Improve v3 pretrain command ( #6040 )
...
* Starts to run
* Update pretrain script
* Update corpus
* Update pretrain schema
* Remove outdated test
* Make JsonlTexts produce Example objects.
2020-09-13 14:05:05 +02:00
Ines Montani
febb99916d
Tidy up and auto-format [ci skip]
2020-09-13 10:55:36 +02:00
Ines Montani
a5633b205f
Fix handling of errors around git [ci skip]
2020-09-13 10:52:28 +02:00
Ines Montani
f8846c198d
Update types and docstrings
2020-09-13 10:52:02 +02:00
Matthew Honnibal
37347830d4
Fix reading in GloVe vectors
2020-09-12 17:31:18 +02:00
Ines Montani
b41be87213
Merge pull request #6051 from svlandeg/feature/cli-config
2020-09-12 17:12:35 +02:00
Ines Montani
eedaaaec75
Fix handling of existing asset without checksum [ci skip]
2020-09-12 17:02:53 +02:00
svlandeg
a75cfe0da6
Merge remote-tracking branch 'upstream/develop' into feature/cli-config
2020-09-12 14:44:40 +02:00
svlandeg
115147804a
string_to_list to parse comma-separated string into a list
2020-09-12 14:43:22 +02:00
Ines Montani
f886f5bbc8
Merge pull request #6048 from explosion/fix/clone-compat
2020-09-12 10:30:49 +02:00
Ines Montani
0b2e07215d
Support overwriting name on spacy package
2020-09-11 11:38:28 +02:00
svlandeg
5b94aeece9
support pipeline as "list in string"
2020-09-11 11:08:46 +02:00
Ines Montani
1bce432b4a
Adjust message [ci skip]
2020-09-11 10:00:49 +02:00
Ines Montani
5acd4fbcd8
Merge branch 'develop' into fix/clone-compat
2020-09-11 09:58:30 +02:00
Ines Montani
761bd60d43
Adjust info message
2020-09-11 09:57:00 +02:00
Ines Montani
6831161bfa
Resolve path to be extra sure
2020-09-11 09:56:49 +02:00
svlandeg
1723fb73c4
remove brol
2020-09-10 17:44:59 +02:00
svlandeg
08a831ce83
process trailing slash if any
2020-09-10 17:39:52 +02:00
Ines Montani
3e83a509bb
WIP: fix project clone compatibility
2020-09-10 15:49:13 +02:00
svlandeg
f1bc09c1e9
restore partly
2020-09-10 14:53:02 +02:00
svlandeg
3889747119
asset fix & UX
2020-09-10 14:36:53 +02:00
svlandeg
a36766d153
hookup branch
2020-09-10 12:00:34 +02:00
svlandeg
97d99f7efa
Merge remote-tracking branch 'upstream/develop' into feature/doc-fixes
2020-09-10 11:51:34 +02:00
Ines Montani
908f3a4494
Update default projects repo [ci skip]
2020-09-10 11:42:14 +02:00
svlandeg
92f9d2f406
small UX fixes
2020-09-10 11:35:50 +02:00
svlandeg
1fc5486792
more fine-grained errors for git_sparse_checkout
2020-09-10 11:31:32 +02:00
Ines Montani
15bc3a37b4
Add --branch to project clone
2020-09-10 11:08:15 +02:00
Sofie Van Landeghem
8e7557656f
Renaming gold & annotation_setter ( #6042 )
...
* version bump to 3.0.0a16
* rename "gold" folder to "training"
* rename 'annotation_setter' to 'set_extra_annotations'
* formatting
2020-09-09 10:31:03 +02:00
Sofie Van Landeghem
60f22e1800
Pipe API ( #6034 )
...
* ensure Language passes on valid examples for initialization
* fix tagger model initialization
* check for valid get_examples across components
* assume labels were added before begin_training
* fix senter initialization
* fix morphologizer initialization
* use methods to check arguments
* test textcat init, requires thinc>=8.0.0a31
* fix tok2vec init
* fix entity linker init
* use islice
* fix simple NER
* cleanup debug model
* fix assert statements
* fix tests
* throw error when adding a label if the output layer can't be resized anymore
* fix test
* add failing test for simple_ner
* UX improvements
* morphologizer UX
* assume begin_training gets a representative set and processes the labels
* remove assumptions for output of untrained NER model
* restore test for original purpose
2020-09-08 22:44:25 +02:00
Matthew Honnibal
ba5f4c9b32
Add words and seconds to train info
2020-09-08 15:24:47 +02:00
Matthew Honnibal
b470062153
Add CLI registry ( #6037 )
2020-09-08 15:23:34 +02:00
Matthew Honnibal
4b7abaafdb
Fix learn rate for non-transformer
2020-09-04 21:22:50 +02:00
Matthew Honnibal
465785a672
Fix project pull and push
2020-09-04 21:15:55 +02:00
Ines Montani
ab1bb421ed
Update docs links in codebase
2020-09-04 12:58:50 +02:00
Ines Montani
2189046869
Merge pull request #6024 from explosion/chore/registry-renaming
2020-09-04 10:54:10 +02:00
Matthew Honnibal
1c07820681
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-09-03 18:54:21 +02:00
Matthew Honnibal
7be8a0516a
Fix project pull
2020-09-03 18:54:03 +02:00
Ines Montani
23b7d9cfa3
Prefix span getters
2020-09-03 17:37:06 +02:00
Ines Montani
c063e55eb7
Add prefix to batchers
2020-09-03 17:30:41 +02:00
Ines Montani
c53b1433b9
Adjust more arguments [ci skip]
2020-09-03 17:12:24 +02:00
Ines Montani
b5a0657fd6
"model" terminology consistency in docs
2020-09-03 13:13:03 +02:00
Matthew Honnibal
122cb02001
Fix averages
2020-09-02 19:37:43 +02:00
Marek Grzenkowicz
92d7832a86
Fix off-by-one error for best iteration calculation ( closes #6014 ) ( #6016 )
2020-09-02 15:15:45 +02:00
Sofie Van Landeghem
6bfb1b3a29
Fix sparse checkout for 'spacy project' ( #6008 )
...
* exit if cloning fails
* UX
* rewrite http link to git protocol, don't use stdin
* fixes to sparse checkout
* formatting
2020-09-01 19:49:01 +02:00
Ines Montani
70b226f69d
Support ignore marker in project document [ci skip]
2020-09-01 12:49:04 +02:00
Ines Montani
a4c51f0f18
Add v3 info to project docs [ci skip]
2020-09-01 12:36:21 +02:00
Ines Montani
ef9005273b
Update fill-config command and add silent mode [ci skip]
2020-09-01 12:07:04 +02:00
Matthew Honnibal
ec660e3131
Fix use_pytorch_for_gpu_memory
2020-09-01 00:41:38 +02:00
Matthw Honnibal
c38298b8fa
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-08-31 19:55:55 +02:00
Matthw Honnibal
fe298fa50a
Shuffle on first epoch of train
2020-08-31 19:55:22 +02:00
svlandeg
13ee742fb4
example of custom logger
2020-08-31 14:24:41 +02:00
svlandeg
c18eb63483
Merge remote-tracking branch 'upstream/develop' into feature/vectors-docs
...
# Conflicts:
# website/docs/usage/embeddings-transformers.md
2020-08-31 13:21:36 +02:00
Sofie Van Landeghem
ec14744ee4
Rename Transformer listener ( #6001 )
...
* rename to spacy-transformers.TransformerListener
* add some more tok2vec tests
* use select_pipes
* fix docs - annotation setter was not changed in the end
2020-08-31 12:41:39 +02:00
Ines Montani
45f46a5c85
Merge pull request #5993 from explosion/feature/disabled-components
2020-08-29 15:58:41 +02:00
Ines Montani
34146750d4
Use frozen list with custom errors
...
We don't want to break backwards compatibility too much but we also want to provide the best possible UX
2020-08-29 15:20:11 +02:00
Ines Montani
2bc31e15c9
Tidy up and auto-format [ci skip]
2020-08-29 13:01:10 +02:00
svlandeg
5230529de2
add loggers registry & logger docs sections
2020-08-28 21:44:04 +02:00
Ines Montani
4ca2698f85
Merge branch 'develop' into feature/debug-config
2020-08-28 11:19:17 +02:00
Ines Montani
d1780db6a4
Tidy up and use different error [ci skip]
2020-08-27 18:56:55 +02:00
Ines Montani
ff4175e839
Add more info to debug config
2020-08-27 18:17:58 +02:00
Ines Montani
8692d176f6
Merge pull request #5978 from explosion/feature/update-wasabi
...
Update wasabi: new diff_strings and MarkdownRenderer
2020-08-26 19:02:52 +02:00
Matthew Honnibal
9b22714a4e
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-08-26 15:48:45 +02:00
Matthew Honnibal
172af24f95
Fix upload and download
2020-08-26 15:48:23 +02:00
Ines Montani
a5fff1df51
Remove outdated non-empty output dir warning [ci skip]
2020-08-26 15:45:51 +02:00
Ines Montani
3aec98ca38
Update wasabi: new diff_strings and MarkdownRenderer
2020-08-26 15:33:11 +02:00
Sofie Van Landeghem
79d460e3a2
Weights & Biases logger for train CLI ( #5971 )
...
* quick test as part of train script
* train_logger in config, default ConsoleLogger in loggers catalogue
* entitiy typo
* add wandb_logger
* cleanup
* Update spacy/cli/train_logger.py
Co-authored-by: Ines Montani <ines@ines.io>
* move loggers to gold.loggers
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-26 15:24:33 +02:00
Ines Montani
0997c30b9e
Merge pull request #5974 from explosion/feature/project-document
2020-08-26 15:14:13 +02:00
Ines Montani
627617a079
Tidy up and add docs [ci skip]
2020-08-26 13:24:55 +02:00
Ines Montani
aeebc6678d
Small cleanup and adjustments
2020-08-26 10:26:57 +02:00
Ines Montani
31567d1e42
Link project.yml
2020-08-26 10:26:32 +02:00
Ines Montani
6c2a5ff53b
Auto-link local sources
2020-08-26 10:26:06 +02:00
Matthew Honnibal
2771e4f2b3
Fix the git "sparse checkout" functionality ( #5973 )
...
* Fix the git sparse checkout functionality
* Format
2020-08-26 04:00:14 +02:00
Ines Montani
1c958a76c1
Add comment markers to only replace auto-generated docs
2020-08-26 00:03:06 +02:00
Ines Montani
f10989e8c4
Add "project document" and more project.yml meta fields
2020-08-25 17:14:27 +02:00
Ines Montani
fdcaf86c54
Adjust docstring
...
End sentence earlier so it's shown as a full sentence in --help
2020-08-25 17:13:50 +02:00
Ines Montani
b89f6fa011
Fix meta defaults and error in package command
2020-08-25 17:13:33 +02:00
Ines Montani
dd84577a98
Update CLI utils, project.yml schema and add test
2020-08-25 11:54:53 +02:00
Matthew Honnibal
8038b87f04
Various small tweaks to project CLI ( #5965 )
...
* Fix up/download of http and local paths
* Support git_sparse_checkout for assets
* Fix scorer
* Handle already-present directories for git assets
* Improve convert command
* Fix support for existant files in git assets
* Support branches in git sparse checkout
* Format
* Fix git assets
* Document git block in assets
* Fix test
* Fix test
* Revert "Fix test"
This reverts commit cf3097260f
.
* Revert "Fix test"
This reverts commit 964d636e27
.
* Dont multiply p/r/f by 100
* Display scores * 100 during training
2020-08-25 00:30:52 +02:00
Ines Montani
e12b03358b
Support removing extra values in fill-config ( #5966 )
...
* Support removing extra values in fill-config
* Fix test
2020-08-24 22:53:47 +02:00
Ines Montani
0e7f99da58
Fix handling of optional [pretraining] block ( #5954 )
...
* Fix handling of optional [pretraining] block
* Remote pretraining from default config
* Fix test
* Add schema option for empty pretrain block
2020-08-24 15:56:03 +02:00
Matthew Honnibal
64df37643f
Update lockfile after project pull
2020-08-24 03:27:09 +02:00
Matthew Honnibal
588c28fe45
Fix project pull when deps missing
2020-08-24 01:23:36 +02:00
Matthew Honnibal
160a855246
Format
2020-08-23 21:15:12 +02:00
Matthew Honnibal
89f5b8abb3
Fix project push
2020-08-23 21:14:44 +02:00
Matthew Honnibal
3828bc3ed0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-08-23 18:32:24 +02:00
Matthew Honnibal
e559867605
Allow spacy project to push and pull to/from remote storage ( #5949 )
...
* Add utils for working with remote storage
* WIP add remote_cache for project
* WIP add push and pull commands
* Use pathy in remote_cache
* Updarte util
* Update remote_cache
* Update util
* Update project assets
* Update pull script
* Update push script
* Fix type annotation in util
* Work on remote storage
* Remove site and env hash
* Fix imports
* Fix type annotation
* Require pathy
* Require pathy
* Fix import
* Add a util to handle project variable substitution
* Import push and pull commands
* Fix pull command
* Fix push command
* Fix tarfile in remote_storage
* Improve printing
* Fiddle with status messages
* Set version to v3.0.0a9
* Draft docs for spacy project remote storages
* Update docs [ci skip]
* Use Thinc config to simplify and unify template variables
* Auto-format
* Don't import Pathy globally for now
Causes slow and annoying Google Cloud warning
* Tidy up test
* Tidy up and update tests
* Update to latest Thinc
* Update docs
* variables -> vars
* Update docs [ci skip]
* Update docs [ci skip]
Co-authored-by: Ines Montani <ines@ines.io>
2020-08-23 18:32:09 +02:00
Matthew Honnibal
fe1cf7e124
Allow score_weights to list extra scores
2020-08-23 18:31:30 +02:00
Ines Montani
9bdc9e81f5
Fix error message [ci skip]
2020-08-23 12:14:02 +02:00
Ines Montani
3826cfb8fe
Merge pull request #5930 from svlandeg/feature/init-config-fix
...
UX for init config
2020-08-21 12:06:33 +02:00
Ines Montani
79af7dcd6d
Small wording adjustments [ci skip]
2020-08-21 12:06:19 +02:00
Matthew Honnibal
c356e62908
Minor adjustments to quickstart template
2020-08-21 00:10:21 +02:00
Ines Montani
6ad59d59fe
Merge branch 'develop' of https://github.com/explosion/spaCy into develop [ci skip]
2020-08-20 11:20:58 +02:00
svlandeg
b96cd9fa5e
fix typo
2020-08-19 18:46:08 +02:00
Ines Montani
e2f2ef3a5a
Update init config and recommendations
...
- As much as I dislike YAML, it seemed like a better format here because it allows us to add comments if we want to explain the different recommendations
- Don't include the generated JS in the repo by default and build it on the fly when running or deploying the site. This ensures it's always up to date.
- Simplify jinja_to_js script and use fewer dependencies
2020-08-19 13:33:15 +02:00
svlandeg
a8acedd4ba
example of custom reader and batcher
2020-08-18 19:15:16 +02:00