Commit Graph

688 Commits

Author SHA1 Message Date
Ines Montani
5f6f4ff594 Remove object subclassing 2020-07-12 14:03:23 +02:00
Sofie Van Landeghem
de6a32315c
debug-model script (#5749)
* adding debug-model to print the internals for debugging purposes

* expend debug-model script with 4 stages: before, init, train, predict

* avoid enforcing to have a seed in the train script

* small fixes
2020-07-10 19:47:53 +02:00
Ines Montani
a60562f208
Update project CLI hashes, directories, skipping (#5741)
* Update project CLI hashes, directories, skipping

* Improve clone success message

* Remove unused context args

* Move project-specific utils to project utils

The hashing/checksum functions may not end up being general-purpose functions and are more designed for the projects, so they shouldn't live in spacy.util

* Improve run help and add workflows

* Add note re: directory checksum speed

* Fix cloning from subdirectories and output messages

* Remove hard-coded dirs
2020-07-09 23:51:18 +02:00
Ines Montani
05e182e421 Update CLI args and docstrings 2020-07-09 19:44:28 +02:00
Adriane Boyd
ac4297ee39
Minor refactor to conversion of output docs (#5718)
Minor refactor of conversion of docs to output format to avoid
duplicate conversion steps.
2020-07-09 19:42:32 +02:00
Matthw Honnibal
7010f1a2be Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-09 19:34:11 +02:00
Matthw Honnibal
77af0a6bb4 Offer option of padding-sensitive batching 2020-07-09 14:50:20 +02:00
Ines Montani
8f9552d9e7
Refactor project CLI (#5732)
* Make project command a submodule

* Update with WIP

* Add helper for joining commands

* Update docstrins, formatting and types

* Update assets and add support for copying local files

* Fix type

* Update success messages
2020-07-09 01:42:51 +02:00
Matthw Honnibal
1b20ffac38 batch_by_words by default 2020-07-08 21:37:06 +02:00
Matthw Honnibal
fb8a5967c1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-08 15:27:50 +02:00
Ines Montani
0a3d41bb1d
Deprecat model shortcuts and simplify download (#5722) 2020-07-08 14:00:07 +02:00
Matthw Honnibal
42e1109def Support option to not batch by number of words 2020-07-08 11:26:54 +02:00
Ines Montani
8cb7f9ccff
Improve assets and DVC handling (#5719)
* Improve assets and DVC handling

* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani
fa261d09e8 Add alternative CLI option 2020-07-06 15:57:38 +02:00
Adriane Boyd
c67fc6aa5b
Make docs_to_json backwards-compatible with v2 (#5714)
* In `spacy convert -t json` output the JSON docs wrapped in a list

* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani
412dbb1f38
Remove dead and/or deprecated code (#5710)
* Remove dead and/or deprecated code

* Remove n_threads

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem
fcbf899b08
Feature/example only (#5707)
* remove _convert_examples

* fix test_gold, raise TypeError if tuples are used instead of Example's

* throwing proper errors when the wrong type of objects are passed

* fix deprectated format in tests

* fix deprectated format in parser tests

* fix tests for NEL, morph, senter, tagger, textcat

* update regression tests with new Example format

* use make_doc

* more fixes to nlp.update calls

* few more small fixes for rehearse and evaluate

* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Ines Montani
37c3bb35e2 Auto-format 2020-07-04 16:25:34 +02:00
Ines Montani
99aff16d60 Make argument shortcut consistent 2020-07-04 14:23:32 +02:00
Matthew Honnibal
2bd1bf81f1
Refactor pretrain and support character-based objective for v3 (#5706)
* Start adding character-based stuff

* Start adding character-based objective

* Start adding character-based stuff

* Start adding character-based objective

* Remove outdated comment

* Update pretraining models

* Add/fix character-based multi-task models

* Refactor pretrain and support character-based objective

* Update pretrain config

* Remove unused

* Fix flake8 errors

* Clean up imports

* Format

* Format

* Update Thinc version

* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani
84fb3a3fb3 Auto-format and fix tuple 2020-07-03 15:20:10 +02:00
Matthew Honnibal
a902b5f217
Record whether Doc objects are built from known spacing (#5697)
* Tell convert CLI to store user data for Doc

* Remove assert

* Add has_unknwon_spaces flag on Doc

* Do not tokenize docs with unknown spaces in Corpus

* Handle conversion of unknown spaces in Example

* Fixes

* Fixes

* Draft has_known_spaces support in DocBin

* Add test for serialize has_unknown_spaces

* Fix DocBin serialization when has_unknown_spaces

* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd
abad56db7d
Add conllu2docs converter (#5704)
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Sofie Van Landeghem
41b65fd0f8
fix to pretrain script (#5699)
* fix to pretrain script

* remove unnecessary import
2020-07-02 21:48:01 +02:00
Ines Montani
ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
Fixing init_model
2020-07-02 14:10:28 +02:00
Ines Montani
fe4cfd0632 Start updating website for v3 [ci skip] 2020-07-01 21:26:39 +02:00
svlandeg
a30bc77415 bugfixing prune_vectors and vectors_loc 2020-07-01 21:00:47 +02:00
Matthw Honnibal
6a0a27e5c2 Fix max_steps 2020-07-01 18:08:14 +02:00
Matthw Honnibal
cb51bb637b Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 15:17:27 +02:00
Matthw Honnibal
2fa56484b2 Fix eval batch size 2020-07-01 15:16:25 +02:00
Matthw Honnibal
c5d12d1a22 Allow batch size to be set for evaluation in spacy train 2020-07-01 15:04:36 +02:00
Ines Montani
bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd 2020-07-01 14:13:19 +02:00
Matthw Honnibal
35af5819e0 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2020-07-01 01:03:39 +02:00
Matthw Honnibal
8c5a88e777 Fix per-epoch shuffling 2020-07-01 01:02:35 +02:00
svlandeg
a7d547c65e small fix 2020-06-30 21:56:17 +02:00
svlandeg
8eca7e995e add try-except to git commands to get an informative warning 2020-06-30 21:53:40 +02:00
Ines Montani
b032943c34 Fix funny printing again 2020-06-30 21:33:41 +02:00
Ines Montani
d64644d9d1 Adjust auto-formatting 2020-06-30 20:36:30 +02:00
Ines Montani
6da3500728 Fix command substitution 2020-06-30 20:35:51 +02:00
svlandeg
e7aff9c5fc bugfix exec usage in dvc.yaml 2020-06-30 18:51:20 +02:00
svlandeg
39953c7c60 fix print_run_help with new arg order 2020-06-30 17:28:09 +02:00
svlandeg
cd632d8ec2 move folder for exec argument one up 2020-06-30 17:19:36 +02:00
svlandeg
1ae6fa2554 move subcommand one place up as project_dir has default 2020-06-30 16:04:53 +02:00
svlandeg
a46b76f188 use current working dir as default throughout 2020-06-30 15:39:24 +02:00
svlandeg
b228111925 fix funny printing 2020-06-30 14:54:45 +02:00
Ines Montani
8e20505970 Resolve within working_dir context manager 2020-06-30 13:29:45 +02:00
Ines Montani
72175b5c60 Update project command 2020-06-30 13:17:26 +02:00
svlandeg
140c4896a0 split_command util function 2020-06-30 12:54:15 +02:00
svlandeg
d23be563eb remove redundant setting of no_args_is_help 2020-06-30 11:23:35 +02:00
svlandeg
b311ce982f Merge remote-tracking branch 'upstream/develop' into fix/small-edits
# Conflicts:
#	spacy/cli/project.py
2020-06-30 11:17:31 +02:00