Ines Montani
796f6c52d1
Merge branch 'develop' into pr/5767
2020-07-19 13:37:46 +02:00
Adriane Boyd
9ee1c54f40
Improve tag map initialization and updating ( #5764 )
...
* Improve tag map initialization and updating
Generalize tag map initialization and updating so that the tag map can
be loaded correctly prior to loading a `Corpus` with `spacy debug-data`
and `spacy train`.
* normalize provided tag map as necessary
* use the same method for initializing and updating the tag map
* Replace rather than update tag map
Replace rather than update tag map when loading a custom tag map.
Updating the tag map is problematic due to the sorted list of tag names
and the fact that the tag map will contain lingering/unwanted tags from
the default tag map.
* Update CLI scripts
* Reinitialize cache after loading new tag map
Reinitialize the cache with the right size after loading a new tag map.
2020-07-19 13:13:57 +02:00
Adriane Boyd
d83e3c44c5
Remove corpus-specific morph rules
...
* Remove corpus-specific morph rules
* Add options similar to tag maps to provide them in the `train` and
`debug-data` CLIs
2020-07-15 19:44:18 +02:00
Ines Montani
872938ec76
Merge pull request #5747 from explosion/feature/refactor-config-args
2020-07-14 00:00:22 +02:00
Ines Montani
5f6f4ff594
Remove object subclassing
2020-07-12 14:03:23 +02:00
Ines Montani
c96535e338
Update command docstrings and docs
2020-07-12 13:53:49 +02:00
Ines Montani
0ab483037c
Make debug commands subcommands of spacy debug
...
Also handle backwards-compatibility so the old commands don't break
2020-07-12 13:53:41 +02:00
Ines Montani
79346853aa
Add debug-config command
2020-07-12 12:31:17 +02:00
Ines Montani
3a8632c3fb
Hide command from public --help for now
...
Not sure we want this to be officially documented yet?
2020-07-11 19:21:22 +02:00
Ines Montani
5e683d03fe
Allow extra args on pretrain and debug_data
2020-07-11 19:17:59 +02:00
Ines Montani
b7111da1d7
Update config and commands
2020-07-11 13:03:53 +02:00
Ines Montani
f99ce7fbfb
Make validation errors more elegant
2020-07-10 23:34:17 +02:00
Ines Montani
fb6f6f584e
Replace - with _ in command names
...
We might as well be nice if user accidentally types --training.use-gpu
2020-07-10 22:34:22 +02:00
Ines Montani
bfa8e11ffa
Update and auto-format
2020-07-10 20:52:00 +02:00
Ines Montani
0389c34b81
Merge branch 'develop' into feature/refactor-config-args
2020-07-10 20:51:52 +02:00
Ines Montani
9fe1fa88ad
Fix typo
2020-07-10 20:32:37 +02:00
Ines Montani
defe1e7213
Pretty-print config validation errors
2020-07-10 20:01:20 +02:00
Sofie Van Landeghem
de6a32315c
debug-model script ( #5749 )
...
* adding debug-model to print the internals for debugging purposes
* expend debug-model script with 4 stages: before, init, train, predict
* avoid enforcing to have a seed in the train script
* small fixes
2020-07-10 19:47:53 +02:00
Ines Montani
a3667394b4
Integrate with latest Thinc and config overrides
2020-07-10 19:47:05 +02:00
Ines Montani
3583ea84d8
Update arg parsing
2020-07-10 18:20:52 +02:00
Ines Montani
73332ddb67
Update CLI commans to use one shared util file
2020-07-10 17:57:40 +02:00
Ines Montani
240e0a62ca
Update with WIP
2020-07-10 13:31:27 +02:00
Ines Montani
a60562f208
Update project CLI hashes, directories, skipping ( #5741 )
...
* Update project CLI hashes, directories, skipping
* Improve clone success message
* Remove unused context args
* Move project-specific utils to project utils
The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util
* Improve run help and add workflows
* Add note re: directory checksum speed
* Fix cloning from subdirectories and output messages
* Remove hard-coded dirs
2020-07-09 23:51:18 +02:00
Ines Montani
05e182e421
Update CLI args and docstrings
2020-07-09 19:44:28 +02:00
Adriane Boyd
ac4297ee39
Minor refactor to conversion of output docs ( #5718 )
...
Minor refactor of conversion of docs to output format to avoid
duplicate conversion steps.
2020-07-09 19:42:32 +02:00
Matthw Honnibal
7010f1a2be
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-09 19:34:11 +02:00
Matthw Honnibal
77af0a6bb4
Offer option of padding-sensitive batching
2020-07-09 14:50:20 +02:00
Ines Montani
8f9552d9e7
Refactor project CLI ( #5732 )
...
* Make project command a submodule
* Update with WIP
* Add helper for joining commands
* Update docstrins, formatting and types
* Update assets and add support for copying local files
* Fix type
* Update success messages
2020-07-09 01:42:51 +02:00
Matthw Honnibal
1b20ffac38
batch_by_words by default
2020-07-08 21:37:06 +02:00
Matthw Honnibal
fb8a5967c1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-08 15:27:50 +02:00
Ines Montani
0a3d41bb1d
Deprecat model shortcuts and simplify download ( #5722 )
2020-07-08 14:00:07 +02:00
Matthw Honnibal
42e1109def
Support option to not batch by number of words
2020-07-08 11:26:54 +02:00
Ines Montani
8cb7f9ccff
Improve assets and DVC handling ( #5719 )
...
* Improve assets and DVC handling
* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani
fa261d09e8
Add alternative CLI option
2020-07-06 15:57:38 +02:00
Adriane Boyd
c67fc6aa5b
Make docs_to_json
backwards-compatible with v2 ( #5714 )
...
* In `spacy convert -t json` output the JSON docs wrapped in a list
* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani
412dbb1f38
Remove dead and/or deprecated code ( #5710 )
...
* Remove dead and/or deprecated code
* Remove n_threads
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem
fcbf899b08
Feature/example only ( #5707 )
...
* remove _convert_examples
* fix test_gold, raise TypeError if tuples are used instead of Example's
* throwing proper errors when the wrong type of objects are passed
* fix deprectated format in tests
* fix deprectated format in parser tests
* fix tests for NEL, morph, senter, tagger, textcat
* update regression tests with new Example format
* use make_doc
* more fixes to nlp.update calls
* few more small fixes for rehearse and evaluate
* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Ines Montani
37c3bb35e2
Auto-format
2020-07-04 16:25:34 +02:00
Ines Montani
99aff16d60
Make argument shortcut consistent
2020-07-04 14:23:32 +02:00
Matthew Honnibal
2bd1bf81f1
Refactor pretrain and support character-based objective for v3 ( #5706 )
...
* Start adding character-based stuff
* Start adding character-based objective
* Start adding character-based stuff
* Start adding character-based objective
* Remove outdated comment
* Update pretraining models
* Add/fix character-based multi-task models
* Refactor pretrain and support character-based objective
* Update pretrain config
* Remove unused
* Fix flake8 errors
* Clean up imports
* Format
* Format
* Update Thinc version
* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani
84fb3a3fb3
Auto-format and fix tuple
2020-07-03 15:20:10 +02:00
Matthew Honnibal
a902b5f217
Record whether Doc objects are built from known spacing ( #5697 )
...
* Tell convert CLI to store user data for Doc
* Remove assert
* Add has_unknwon_spaces flag on Doc
* Do not tokenize docs with unknown spaces in Corpus
* Handle conversion of unknown spaces in Example
* Fixes
* Fixes
* Draft has_known_spaces support in DocBin
* Add test for serialize has_unknown_spaces
* Fix DocBin serialization when has_unknown_spaces
* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd
abad56db7d
Add conllu2docs converter ( #5704 )
...
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Sofie Van Landeghem
41b65fd0f8
fix to pretrain script ( #5699 )
...
* fix to pretrain script
* remove unnecessary import
2020-07-02 21:48:01 +02:00
Ines Montani
ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
...
Fixing init_model
2020-07-02 14:10:28 +02:00
Ines Montani
fe4cfd0632
Start updating website for v3 [ci skip]
2020-07-01 21:26:39 +02:00
svlandeg
a30bc77415
bugfixing prune_vectors and vectors_loc
2020-07-01 21:00:47 +02:00
Matthw Honnibal
6a0a27e5c2
Fix max_steps
2020-07-01 18:08:14 +02:00
Matthw Honnibal
cb51bb637b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 15:17:27 +02:00
Matthw Honnibal
2fa56484b2
Fix eval batch size
2020-07-01 15:16:25 +02:00
Matthw Honnibal
c5d12d1a22
Allow batch size to be set for evaluation in spacy train
2020-07-01 15:04:36 +02:00
Ines Montani
bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd
2020-07-01 14:13:19 +02:00
Matthw Honnibal
35af5819e0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 01:03:39 +02:00
Matthw Honnibal
8c5a88e777
Fix per-epoch shuffling
2020-07-01 01:02:35 +02:00
svlandeg
a7d547c65e
small fix
2020-06-30 21:56:17 +02:00
svlandeg
8eca7e995e
add try-except to git commands to get an informative warning
2020-06-30 21:53:40 +02:00
Ines Montani
b032943c34
Fix funny printing again
2020-06-30 21:33:41 +02:00
Ines Montani
d64644d9d1
Adjust auto-formatting
2020-06-30 20:36:30 +02:00
Ines Montani
6da3500728
Fix command substitution
2020-06-30 20:35:51 +02:00
svlandeg
e7aff9c5fc
bugfix exec usage in dvc.yaml
2020-06-30 18:51:20 +02:00
svlandeg
39953c7c60
fix print_run_help with new arg order
2020-06-30 17:28:09 +02:00
svlandeg
cd632d8ec2
move folder for exec argument one up
2020-06-30 17:19:36 +02:00
svlandeg
1ae6fa2554
move subcommand one place up as project_dir has default
2020-06-30 16:04:53 +02:00
svlandeg
a46b76f188
use current working dir as default throughout
2020-06-30 15:39:24 +02:00
svlandeg
b228111925
fix funny printing
2020-06-30 14:54:45 +02:00
Ines Montani
8e20505970
Resolve within working_dir context manager
2020-06-30 13:29:45 +02:00
Ines Montani
72175b5c60
Update project command
2020-06-30 13:17:26 +02:00
svlandeg
140c4896a0
split_command util function
2020-06-30 12:54:15 +02:00
svlandeg
d23be563eb
remove redundant setting of no_args_is_help
2020-06-30 11:23:35 +02:00
svlandeg
b311ce982f
Merge remote-tracking branch 'upstream/develop' into fix/small-edits
...
# Conflicts:
# spacy/cli/project.py
2020-06-30 11:17:31 +02:00
svlandeg
7e4cbda89a
fix project_init for relative path
2020-06-30 11:09:53 +02:00
Ines Montani
e8033df81e
Also handle python3 and pip3
2020-06-29 20:30:42 +02:00
Ines Montani
c874dde66c
Show help on "spacy project"
2020-06-29 20:11:34 +02:00
Ines Montani
1d2c646e57
Fix init and remove .dvc/plots
2020-06-29 20:07:21 +02:00
svlandeg
1176783310
fix one more shlex.split
2020-06-29 18:37:42 +02:00
svlandeg
894b8e7ff6
throw warning (instead of crashing) when temp dir can't be cleaned
2020-06-29 18:16:39 +02:00
svlandeg
efe7eb71f2
create subfolder in working dir
2020-06-29 17:46:08 +02:00
svlandeg
3487214ba1
fix shlex.split for non-posix
2020-06-29 17:45:47 +02:00
Ines Montani
126050f259
Improve asset fetching
...
Get all paths first and run dvc add once so it only shows one progress bar and one combined git command (if repo is git repo)
2020-06-29 16:55:24 +02:00
Ines Montani
7c08713baa
Improve error messages
2020-06-29 16:54:47 +02:00
Ines Montani
24664efa23
Import project_run_all function
2020-06-29 16:54:19 +02:00
svlandeg
f8dddeda27
print help msg when just calling 'project' without args
2020-06-29 16:38:15 +02:00
svlandeg
bf43ebbf61
fix typo's
2020-06-29 16:32:25 +02:00
Sofie Van Landeghem
8d3c0306e1
refactor fixes ( #5664 )
...
* fixes in ud_train, UX for morphs
* update pyproject with new version of thinc
* fixes in debug_data script
* cleanup of old unused error messages
* remove obsolete TempErrors
* move error messages to errors.py
* add ENT_KB_ID to default DocBin serialization
* few fixes to simple_ner
* fix tags
2020-06-29 14:33:00 +02:00
Ines Montani
569376e34e
Replace curl with requests
2020-06-28 16:25:53 +02:00
Ines Montani
dbe86b3453
Update project.py
2020-06-28 15:45:19 +02:00
Ines Montani
dbfa292ed3
Output more stats in evaluate
2020-06-28 15:34:28 +02:00
Ines Montani
90b7fa8fed
Run DVC command in project dir
2020-06-28 15:33:53 +02:00
Ines Montani
2f6ee0d018
Tidy up, document and add custom clone logic
2020-06-28 15:08:35 +02:00
Matthew Honnibal
dc7a9be9f8
Merge branch 'feature/project-cli' of https://github.com/explosion/spaCy into feature/project-cli
2020-06-28 14:07:53 +02:00
Matthew Honnibal
e08257d401
Add example of how to do sparse-checkout
2020-06-28 14:07:32 +02:00
Ines Montani
1b331237aa
Update hashing and config update
2020-06-28 13:17:19 +02:00
Ines Montani
f385344286
Update asset logic and add import-url
2020-06-28 13:07:31 +02:00
Ines Montani
d6aa4cb478
Update asset logic
2020-06-28 12:40:11 +02:00
Ines Montani
ed46951842
Update
2020-06-28 12:24:59 +02:00
Ines Montani
d54f33441a
Merge branch 'feature/project-cli' of https://github.com/explosion/spaCy into feature/project-cli
2020-06-27 21:17:00 +02:00
Ines Montani
cd0dd78276
Simplify model loading (now supported via load_model)
2020-06-27 21:16:57 +02:00
Matthew Honnibal
8e3baebdce
Merge branch 'feature/project-cli' of https://github.com/explosion/spaCy into feature/project-cli
2020-06-27 21:16:18 +02:00
Matthew Honnibal
d8c70b415e
Fix Example usage in evaluate
2020-06-27 21:15:25 +02:00
Ines Montani
e33d2b1bea
Add success message
2020-06-27 21:15:13 +02:00