Ines Montani
84fb3a3fb3
Auto-format and fix tuple
2020-07-03 15:20:10 +02:00
Matthew Honnibal
e1b3e8ee11
Set version to v3.0.0a1
2020-07-03 13:21:08 +02:00
Matthew Honnibal
a902b5f217
Record whether Doc objects are built from known spacing ( #5697 )
...
* Tell convert CLI to store user data for Doc
* Remove assert
* Add has_unknwon_spaces flag on Doc
* Do not tokenize docs with unknown spaces in Corpus
* Handle conversion of unknown spaces in Example
* Fixes
* Fixes
* Draft has_known_spaces support in DocBin
* Add test for serialize has_unknown_spaces
* Fix DocBin serialization when has_unknown_spaces
* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd
abad56db7d
Add conllu2docs converter ( #5704 )
...
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Jan Jessewitsch
e4dcac4a4b
Merging multiple docs into one ( #5032 )
...
* Add static method to Doc to allow merging of multiple docs.
* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().
* Add test for Doc.from_docs() implementation.
* Fix using numpy's concatenate in Doc.from_docs.
* Replace typing's type annotations in from_docs.
* Simply remove type annotations in from_docs.
* Add documentation for Doc.from_docs to api.
* Simplify from_docs, its test and the api doc for codebase consistency.
* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.
* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.
* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.
* Add MORPH to attrs
* Update warnings calls
* Remove out-dated error from merge
* Rename space_delimiter to ensure_whitespace
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
Sofie Van Landeghem
41b65fd0f8
fix to pretrain script ( #5699 )
...
* fix to pretrain script
* remove unnecessary import
2020-07-02 21:48:01 +02:00
Adriane Boyd
a723fa02a1
DocBin: add version number, missing attributes and strings ( #5685 )
...
* Add version number to DocBin
Add a version number to DocBin for future use.
* Add POS to all attributes in DocBin
* Add morph string to strings in DocBin
* Update DocBin API
* Add string for ENT_KB_ID in DocBin
2020-07-02 17:41:50 +02:00
Ines Montani
d36632553a
Merge pull request #5688 from explosion/remove-deprecated
...
Remove deprecated methods: Doc.print_tree, Doc.merge, Span.merge
2020-07-02 15:10:30 +02:00
Ines Montani
8a5b9a6d5f
Merge pull request #5693 from svlandeg/bugfix/nel-v3
2020-07-02 14:45:46 +02:00
Ines Montani
ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
...
Fixing init_model
2020-07-02 14:10:28 +02:00
svlandeg
04ed4d60a8
raise error when links are not aligned to tokens
2020-07-02 13:57:35 +02:00
svlandeg
f503817623
fix parsing entity links in new gold format
2020-07-02 13:48:11 +02:00
Ines Montani
60c2695131
Remove deprecated methods
2020-07-01 22:33:39 +02:00
Ines Montani
fe4cfd0632
Start updating website for v3 [ci skip]
2020-07-01 21:26:39 +02:00
svlandeg
a30bc77415
bugfixing prune_vectors and vectors_loc
2020-07-01 21:00:47 +02:00
Matthw Honnibal
94a0cf46fd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 18:45:45 +02:00
Matthw Honnibal
6a0a27e5c2
Fix max_steps
2020-07-01 18:08:14 +02:00
Ines Montani
8d90e44d74
Fix title
2020-07-01 15:38:01 +02:00
Ines Montani
8fb574900a
Update parent package and version
2020-07-01 15:35:23 +02:00
Matthew Honnibal
0ada186dda
Set version to v3.0.0.dev14
2020-07-01 15:31:04 +02:00
Matthw Honnibal
cb51bb637b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 15:17:27 +02:00
Matthw Honnibal
7734cbc34d
Set batch size in begin_training
2020-07-01 15:16:59 +02:00
Matthw Honnibal
1f7709e9a6
Improve max length check in corpus
2020-07-01 15:16:43 +02:00
Matthw Honnibal
2fa56484b2
Fix eval batch size
2020-07-01 15:16:25 +02:00
Matthw Honnibal
c5d12d1a22
Allow batch size to be set for evaluation in spacy train
2020-07-01 15:04:36 +02:00
Matthw Honnibal
f5532757a3
Filter out 0-length examples in Corpus
2020-07-01 15:02:37 +02:00
Ines Montani
bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd
2020-07-01 14:13:19 +02:00
Matthw Honnibal
52338a07bb
Set version to v3.0.0.dev13
2020-07-01 02:49:17 +02:00
Matthw Honnibal
fa6d473390
Fix parser maxout_pieces=1
2020-07-01 02:48:58 +02:00
Matthw Honnibal
35af5819e0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 01:03:39 +02:00
Matthw Honnibal
0d6edf5397
Clean up debug code in transition_system
2020-07-01 01:03:20 +02:00
Matthw Honnibal
a1b6add4c8
Fix parser gold cutting and gradient normalization
2020-07-01 01:02:58 +02:00
Matthw Honnibal
8c5a88e777
Fix per-epoch shuffling
2020-07-01 01:02:35 +02:00
svlandeg
a7d547c65e
small fix
2020-06-30 21:56:17 +02:00
svlandeg
8eca7e995e
add try-except to git commands to get an informative warning
2020-06-30 21:53:40 +02:00
Ines Montani
b032943c34
Fix funny printing again
2020-06-30 21:33:41 +02:00
Matthw Honnibal
d525552979
Fix efficiency of parser backprop_nonlinearity
2020-06-30 21:22:54 +02:00
Ines Montani
d64644d9d1
Adjust auto-formatting
2020-06-30 20:36:30 +02:00
Ines Montani
6da3500728
Fix command substitution
2020-06-30 20:35:51 +02:00
svlandeg
e7aff9c5fc
bugfix exec usage in dvc.yaml
2020-06-30 18:51:20 +02:00
svlandeg
60f97bc519
add custom warning when run_command fails
2020-06-30 17:28:43 +02:00
svlandeg
39953c7c60
fix print_run_help with new arg order
2020-06-30 17:28:09 +02:00
svlandeg
cd632d8ec2
move folder for exec argument one up
2020-06-30 17:19:36 +02:00
svlandeg
1ae6fa2554
move subcommand one place up as project_dir has default
2020-06-30 16:04:53 +02:00
svlandeg
a46b76f188
use current working dir as default throughout
2020-06-30 15:39:24 +02:00
svlandeg
b228111925
fix funny printing
2020-06-30 14:54:45 +02:00
Ines Montani
8e20505970
Resolve within working_dir context manager
2020-06-30 13:29:45 +02:00
Ines Montani
72175b5c60
Update project command
2020-06-30 13:17:26 +02:00
Ines Montani
c5e31acb06
Make working_dir yield absolute cwd path
2020-06-30 13:17:14 +02:00
Ines Montani
3aca404735
Make run_command take string and list
2020-06-30 13:17:00 +02:00
Ines Montani
7584fdafec
Fix typo
2020-06-30 12:59:13 +02:00
svlandeg
140c4896a0
split_command util function
2020-06-30 12:54:15 +02:00
Matthw Honnibal
57e09747dc
Improve efficiency of get_oracle_sequences
2020-06-30 11:50:48 +02:00
Matthw Honnibal
233945bfe0
Fix init for padding
2020-06-30 11:50:24 +02:00
svlandeg
d23be563eb
remove redundant setting of no_args_is_help
2020-06-30 11:23:35 +02:00
svlandeg
b311ce982f
Merge remote-tracking branch 'upstream/develop' into fix/small-edits
...
# Conflicts:
# spacy/cli/project.py
2020-06-30 11:17:31 +02:00
svlandeg
7e4cbda89a
fix project_init for relative path
2020-06-30 11:09:53 +02:00
Matthw Honnibal
85ed5730a2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-06-30 01:14:16 +02:00
Ines Montani
e8033df81e
Also handle python3 and pip3
2020-06-29 20:30:42 +02:00
Ines Montani
c874dde66c
Show help on "spacy project"
2020-06-29 20:11:34 +02:00
Ines Montani
1d2c646e57
Fix init and remove .dvc/plots
2020-06-29 20:07:21 +02:00
Matthw Honnibal
5bed6fc431
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-06-29 19:55:24 +02:00
svlandeg
1176783310
fix one more shlex.split
2020-06-29 18:37:42 +02:00
svlandeg
ff233d5743
print details on error msg (e.g. PermissionError on specific file)
2020-06-29 18:22:33 +02:00
svlandeg
894b8e7ff6
throw warning (instead of crashing) when temp dir can't be cleaned
2020-06-29 18:16:39 +02:00
svlandeg
efe7eb71f2
create subfolder in working dir
2020-06-29 17:46:08 +02:00
svlandeg
3487214ba1
fix shlex.split for non-posix
2020-06-29 17:45:47 +02:00
Ines Montani
126050f259
Improve asset fetching
...
Get all paths first and run dvc add once so it only shows one progress bar and one combined git command (if repo is git repo)
2020-06-29 16:55:24 +02:00
Ines Montani
7c08713baa
Improve error messages
2020-06-29 16:54:47 +02:00
Ines Montani
24664efa23
Import project_run_all function
2020-06-29 16:54:19 +02:00
svlandeg
f8dddeda27
print help msg when just calling 'project' without args
2020-06-29 16:38:15 +02:00
svlandeg
bf43ebbf61
fix typo's
2020-06-29 16:32:25 +02:00
Matthew Honnibal
67928036f2
Set version to v3.0.0.dev12
2020-06-29 14:45:43 +02:00
Sofie Van Landeghem
8d3c0306e1
refactor fixes ( #5664 )
...
* fixes in ud_train, UX for morphs
* update pyproject with new version of thinc
* fixes in debug_data script
* cleanup of old unused error messages
* remove obsolete TempErrors
* move error messages to errors.py
* add ENT_KB_ID to default DocBin serialization
* few fixes to simple_ner
* fix tags
2020-06-29 14:33:00 +02:00
Sofie Van Landeghem
fc3cb1fa9e
NER align tests ( #5656 )
...
* one_to_man works better. misalignment doesn't yet.
* fix tests
* restore example
* xfail alignment tests
2020-06-29 13:59:17 +02:00
Matthew Honnibal
2d9604d39c
Set version to v3.0.0.dev11
2020-06-29 13:56:46 +02:00
Matthw Honnibal
da50473701
Tweak efficiency of arc_eager.set_costs
2020-06-29 12:17:41 +02:00
Ines Montani
bac8a8d766
Merge branch 'feature/project-cli' into develop
2020-06-29 10:49:05 +02:00
Matthew Honnibal
e14bf9decb
Set version to v3.0.0.dev9
2020-06-28 23:58:10 +02:00
Matthew Honnibal
58c8f731bd
Set version to v3.0.0.dev9
2020-06-28 23:53:14 +02:00
Ines Montani
569376e34e
Replace curl with requests
2020-06-28 16:25:53 +02:00
Ines Montani
dbe86b3453
Update project.py
2020-06-28 15:45:19 +02:00
Ines Montani
dbfa292ed3
Output more stats in evaluate
2020-06-28 15:34:28 +02:00
Ines Montani
90b7fa8fed
Run DVC command in project dir
2020-06-28 15:33:53 +02:00
Ines Montani
2f6ee0d018
Tidy up, document and add custom clone logic
2020-06-28 15:08:35 +02:00
Matthew Honnibal
dc7a9be9f8
Merge branch 'feature/project-cli' of https://github.com/explosion/spaCy into feature/project-cli
2020-06-28 14:07:53 +02:00
Matthew Honnibal
e08257d401
Add example of how to do sparse-checkout
2020-06-28 14:07:32 +02:00
Ines Montani
1b331237aa
Update hashing and config update
2020-06-28 13:17:19 +02:00
Ines Montani
f385344286
Update asset logic and add import-url
2020-06-28 13:07:31 +02:00
Ines Montani
d6aa4cb478
Update asset logic
2020-06-28 12:40:11 +02:00
Ines Montani
ed46951842
Update
2020-06-28 12:24:59 +02:00
Ines Montani
d54f33441a
Merge branch 'feature/project-cli' of https://github.com/explosion/spaCy into feature/project-cli
2020-06-27 21:17:00 +02:00
Ines Montani
cd0dd78276
Simplify model loading (now supported via load_model)
2020-06-27 21:16:57 +02:00
Matthew Honnibal
8e3baebdce
Merge branch 'feature/project-cli' of https://github.com/explosion/spaCy into feature/project-cli
2020-06-27 21:16:18 +02:00
Matthew Honnibal
d8c70b415e
Fix Example usage in evaluate
2020-06-27 21:15:25 +02:00
Ines Montani
e33d2b1bea
Add success message
2020-06-27 21:15:13 +02:00
Ines Montani
42eb381ec6
Improve output handling in evaluate
2020-06-27 21:13:11 +02:00
Ines Montani
df22d490b1
Tidy up types
2020-06-27 21:13:06 +02:00
Ines Montani
6678bd80c2
Check if deps exist in non-DVC commands
2020-06-27 20:57:26 +02:00
Ines Montani
fe06697150
Fix package command and add version option
2020-06-27 20:36:08 +02:00