Ines Montani
c96535e338
Update command docstrings and docs
2020-07-12 13:53:49 +02:00
Ines Montani
0ab483037c
Make debug commands subcommands of spacy debug
...
Also handle backwards-compatibility so the old commands don't break
2020-07-12 13:53:41 +02:00
Ines Montani
79346853aa
Add debug-config command
2020-07-12 12:31:17 +02:00
Ines Montani
3a8632c3fb
Hide command from public --help for now
...
Not sure we want this to be officially documented yet?
2020-07-11 19:21:22 +02:00
Ines Montani
5e683d03fe
Allow extra args on pretrain and debug_data
2020-07-11 19:17:59 +02:00
Ines Montani
b7111da1d7
Update config and commands
2020-07-11 13:03:53 +02:00
Ines Montani
f99ce7fbfb
Make validation errors more elegant
2020-07-10 23:34:17 +02:00
Ines Montani
fb6f6f584e
Replace - with _ in command names
...
We might as well be nice if user accidentally types --training.use-gpu
2020-07-10 22:34:22 +02:00
Ines Montani
bfa8e11ffa
Update and auto-format
2020-07-10 20:52:00 +02:00
Ines Montani
0389c34b81
Merge branch 'develop' into feature/refactor-config-args
2020-07-10 20:51:52 +02:00
Ines Montani
9fe1fa88ad
Fix typo
2020-07-10 20:32:37 +02:00
Ines Montani
defe1e7213
Pretty-print config validation errors
2020-07-10 20:01:20 +02:00
Sofie Van Landeghem
de6a32315c
debug-model script ( #5749 )
...
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes
2020-07-10 19:47:53 +02:00
Ines Montani
a3667394b4
Integrate with latest Thinc and config overrides
2020-07-10 19:47:05 +02:00
Ines Montani
3583ea84d8
Update arg parsing
2020-07-10 18:20:52 +02:00
Ines Montani
73332ddb67
Update CLI commans to use one shared util file
2020-07-10 17:57:40 +02:00
Ines Montani
240e0a62ca
Update with WIP
2020-07-10 13:31:27 +02:00
Ines Montani
a60562f208
Update project CLI hashes, directories, skipping ( #5741 )
...
* Update project CLI hashes, directories, skipping
* Improve clone success message
* Remove unused context args
* Move project-specific utils to project utils
The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util
* Improve run help and add workflows
* Add note re: directory checksum speed
* Fix cloning from subdirectories and output messages
* Remove hard-coded dirs
2020-07-09 23:51:18 +02:00
Ines Montani
05e182e421
Update CLI args and docstrings
2020-07-09 19:44:28 +02:00
Adriane Boyd
ac4297ee39
Minor refactor to conversion of output docs ( #5718 )
...
Minor refactor of conversion of docs to output format to avoid
duplicate conversion steps.
2020-07-09 19:42:32 +02:00
Matthw Honnibal
7010f1a2be
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-09 19:34:11 +02:00
Matthw Honnibal
77af0a6bb4
Offer option of padding-sensitive batching
2020-07-09 14:50:20 +02:00
Ines Montani
8f9552d9e7
Refactor project CLI ( #5732 )
...
* Make project command a submodule
* Update with WIP
* Add helper for joining commands
* Update docstrins, formatting and types
* Update assets and add support for copying local files
* Fix type
* Update success messages
2020-07-09 01:42:51 +02:00
Matthw Honnibal
1b20ffac38
batch_by_words by default
2020-07-08 21:37:06 +02:00
Matthw Honnibal
fb8a5967c1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-08 15:27:50 +02:00
Ines Montani
0a3d41bb1d
Deprecat model shortcuts and simplify download ( #5722 )
2020-07-08 14:00:07 +02:00
Matthw Honnibal
42e1109def
Support option to not batch by number of words
2020-07-08 11:26:54 +02:00
Ines Montani
8cb7f9ccff
Improve assets and DVC handling ( #5719 )
...
* Improve assets and DVC handling
* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani
fa261d09e8
Add alternative CLI option
2020-07-06 15:57:38 +02:00
Adriane Boyd
c67fc6aa5b
Make docs_to_json
backwards-compatible with v2 ( #5714 )
...
* In `spacy convert -t json` output the JSON docs wrapped in a list
* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani
412dbb1f38
Remove dead and/or deprecated code ( #5710 )
...
* Remove dead and/or deprecated code
* Remove n_threads
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem
fcbf899b08
Feature/example only ( #5707 )
...
* remove _convert_examples
* fix test_gold, raise TypeError if tuples are used instead of Example's
* throwing proper errors when the wrong type of objects are passed
* fix deprectated format in tests
* fix deprectated format in parser tests
* fix tests for NEL, morph, senter, tagger, textcat
* update regression tests with new Example format
* use make_doc
* more fixes to nlp.update calls
* few more small fixes for rehearse and evaluate
* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Ines Montani
37c3bb35e2
Auto-format
2020-07-04 16:25:34 +02:00
Ines Montani
99aff16d60
Make argument shortcut consistent
2020-07-04 14:23:32 +02:00
Matthew Honnibal
2bd1bf81f1
Refactor pretrain and support character-based objective for v3 ( #5706 )
...
* Start adding character-based stuff
* Start adding character-based objective
* Start adding character-based stuff
* Start adding character-based objective
* Remove outdated comment
* Update pretraining models
* Add/fix character-based multi-task models
* Refactor pretrain and support character-based objective
* Update pretrain config
* Remove unused
* Fix flake8 errors
* Clean up imports
* Format
* Format
* Update Thinc version
* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani
84fb3a3fb3
Auto-format and fix tuple
2020-07-03 15:20:10 +02:00
Matthew Honnibal
a902b5f217
Record whether Doc objects are built from known spacing ( #5697 )
...
* Tell convert CLI to store user data for Doc
* Remove assert
* Add has_unknwon_spaces flag on Doc
* Do not tokenize docs with unknown spaces in Corpus
* Handle conversion of unknown spaces in Example
* Fixes
* Fixes
* Draft has_known_spaces support in DocBin
* Add test for serialize has_unknown_spaces
* Fix DocBin serialization when has_unknown_spaces
* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd
abad56db7d
Add conllu2docs converter ( #5704 )
...
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Sofie Van Landeghem
41b65fd0f8
fix to pretrain script ( #5699 )
...
* fix to pretrain script
* remove unnecessary import
2020-07-02 21:48:01 +02:00
Ines Montani
ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
...
Fixing init_model
2020-07-02 14:10:28 +02:00
Ines Montani
fe4cfd0632
Start updating website for v3 [ci skip]
2020-07-01 21:26:39 +02:00
svlandeg
a30bc77415
bugfixing prune_vectors and vectors_loc
2020-07-01 21:00:47 +02:00
Matthw Honnibal
6a0a27e5c2
Fix max_steps
2020-07-01 18:08:14 +02:00
Matthw Honnibal
cb51bb637b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 15:17:27 +02:00
Matthw Honnibal
2fa56484b2
Fix eval batch size
2020-07-01 15:16:25 +02:00
Matthw Honnibal
c5d12d1a22
Allow batch size to be set for evaluation in spacy train
2020-07-01 15:04:36 +02:00
Ines Montani
bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd
2020-07-01 14:13:19 +02:00
Matthw Honnibal
35af5819e0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 01:03:39 +02:00
Matthw Honnibal
8c5a88e777
Fix per-epoch shuffling
2020-07-01 01:02:35 +02:00
svlandeg
a7d547c65e
small fix
2020-06-30 21:56:17 +02:00