Matthew Honnibal
1827f22f56
Set version to v3.0.0a3
2020-07-09 19:38:04 +02:00
Matthw Honnibal
7010f1a2be
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-09 19:34:11 +02:00
Matthw Honnibal
77af0a6bb4
Offer option of padding-sensitive batching
2020-07-09 14:50:20 +02:00
Matthw Honnibal
3a7f275c02
Add extra batch util
2020-07-09 14:38:41 +02:00
Matthw Honnibal
eb0798c421
Add __len__ method for Example
2020-07-09 14:38:26 +02:00
Ines Montani
8f9552d9e7
Refactor project CLI ( #5732 )
...
* Make project command a submodule
* Update with WIP
* Add helper for joining commands
* Update docstrins, formatting and types
* Update assets and add support for copying local files
* Fix type
* Update success messages
2020-07-09 01:42:51 +02:00
Adriane Boyd
ad15499b3b
Fix get_loss for values outside of labels in senter ( #5730 )
...
* Fix get_loss for None alignments in senter
When converting the `sent_start` values back to `SentenceRecognizer`
labels, handle `None` alignments.
* Handle SENT_START as -1
Handle SENT_START as -1 (or -1 converted to uint64) by treating any
values other than 1 the same as 0 in `SentenceRecognizer.get_loss`.
2020-07-09 01:41:58 +02:00
Matthw Honnibal
1b20ffac38
batch_by_words by default
2020-07-08 21:37:06 +02:00
Matthw Honnibal
93e50da46a
Remove auto 'set_annotation' in training to address GPU memory
2020-07-08 21:36:51 +02:00
Matthw Honnibal
fb8a5967c1
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-08 15:27:50 +02:00
Ines Montani
0a3d41bb1d
Deprecat model shortcuts and simplify download ( #5722 )
2020-07-08 14:00:07 +02:00
Adriane Boyd
c9f0f75778
Update get_loss for senter and morphologizer ( #5724 )
...
* Update get_loss for senter
Update `SentenceRecognizer.get_loss` to keep it similar to `Tagger`.
* Update get_loss for morphologizer
Update `Morphologizer.get_loss` to keep it similar to `Tagger`.
2020-07-08 13:59:28 +02:00
Matthw Honnibal
ca989f4cc4
Improve cutting logic in parser
2020-07-08 11:27:54 +02:00
Matthw Honnibal
42e1109def
Support option to not batch by number of words
2020-07-08 11:26:54 +02:00
Ines Montani
8cb7f9ccff
Improve assets and DVC handling ( #5719 )
...
* Improve assets and DVC handling
* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Sofie Van Landeghem
a39a110c4e
Few more Example unit tests ( #5720 )
...
* small fixes in Example, UX
* add gold tests for aligned_spans and get_aligned_parse
* sentencizer unnecessary
2020-07-07 18:46:00 +02:00
Matthw Honnibal
433dc3c9c9
Simplify PrecomputableAffine slightly
2020-07-07 17:22:47 +02:00
Matthw Honnibal
a4164f67ca
Don't normalize gradients
2020-07-07 17:21:58 +02:00
Matthw Honnibal
8177f25b6c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-07 17:21:10 +02:00
Ines Montani
fa00a85828
Merge pull request #5715 from explosion/chore/tidy-regression-tests
2020-07-07 11:22:07 +02:00
Matthw Honnibal
d1fd3438c3
Add dropout to parser hidden layer
2020-07-07 01:38:15 +02:00
Matthw Honnibal
f25761e513
Dont randomize cuts in parser
2020-07-06 17:51:25 +02:00
Matthw Honnibal
709fc5e4ad
Clarify dropout and seed in Tok2Vec
2020-07-06 17:50:21 +02:00
Matthew Honnibal
19d42f42de
Set version to v3.0.0a2
2020-07-06 17:43:12 +02:00
Matthew Honnibal
cc477be952
Improve gold-standard alignment ( #5711 )
...
* Remove previous alignment
* Implement better alignment, using ragged data structure
* Use pytokenizations for alignment
* Fixes
* Fixes
* Fix overlapping entities in alignment
* Fix align split_sents
* Update test
* Commit align.py
* Try to appease setuptools
* Fix flake8
* use realistic entities for testing
* Update tests for better alignment
* Improve alignment heuristic
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2020-07-06 17:39:31 +02:00
Ines Montani
b6deef80f8
Fix class to pickling works as expected
2020-07-06 16:43:45 +02:00
Ines Montani
fa261d09e8
Add alternative CLI option
2020-07-06 15:57:38 +02:00
Adriane Boyd
c67fc6aa5b
Make docs_to_json
backwards-compatible with v2 ( #5714 )
...
* In `spacy convert -t json` output the JSON docs wrapped in a list
* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani
5b7b2a498d
Tidy up and merge regression tests
2020-07-06 14:05:59 +02:00
Ines Montani
412dbb1f38
Remove dead and/or deprecated code ( #5710 )
...
* Remove dead and/or deprecated code
* Remove n_threads
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem
fcbf899b08
Feature/example only ( #5707 )
...
* remove _convert_examples
* fix test_gold, raise TypeError if tuples are used instead of Example's
* throwing proper errors when the wrong type of objects are passed
* fix deprectated format in tests
* fix deprectated format in parser tests
* fix tests for NEL, morph, senter, tagger, textcat
* update regression tests with new Example format
* use make_doc
* more fixes to nlp.update calls
* few more small fixes for rehearse and evaluate
* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Matthw Honnibal
3f6f087113
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-04 23:52:12 +02:00
Matthw Honnibal
5642507823
Fix has_unknown_spaces in Doc.copy
2020-07-04 23:52:02 +02:00
Matthw Honnibal
8870a6ded7
Specify seeds in HashEmbed
2020-07-04 23:51:49 +02:00
Ines Montani
37c3bb35e2
Auto-format
2020-07-04 16:25:34 +02:00
Ines Montani
abd173937f
Auto-format and update URL
2020-07-04 14:23:44 +02:00
Ines Montani
99aff16d60
Make argument shortcut consistent
2020-07-04 14:23:32 +02:00
Matthew Honnibal
2bd1bf81f1
Refactor pretrain and support character-based objective for v3 ( #5706 )
...
* Start adding character-based stuff
* Start adding character-based objective
* Start adding character-based stuff
* Start adding character-based objective
* Remove outdated comment
* Update pretraining models
* Add/fix character-based multi-task models
* Refactor pretrain and support character-based objective
* Update pretrain config
* Remove unused
* Fix flake8 errors
* Clean up imports
* Format
* Format
* Update Thinc version
* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani
84fb3a3fb3
Auto-format and fix tuple
2020-07-03 15:20:10 +02:00
Matthew Honnibal
e1b3e8ee11
Set version to v3.0.0a1
2020-07-03 13:21:08 +02:00
Matthew Honnibal
a902b5f217
Record whether Doc objects are built from known spacing ( #5697 )
...
* Tell convert CLI to store user data for Doc
* Remove assert
* Add has_unknwon_spaces flag on Doc
* Do not tokenize docs with unknown spaces in Corpus
* Handle conversion of unknown spaces in Example
* Fixes
* Fixes
* Draft has_known_spaces support in DocBin
* Add test for serialize has_unknown_spaces
* Fix DocBin serialization when has_unknown_spaces
* Use serialization in test
2020-07-03 12:58:16 +02:00
Adriane Boyd
abad56db7d
Add conllu2docs converter ( #5704 )
...
Add conllu2docs converter adapted from conllu2json converter
2020-07-03 12:54:32 +02:00
Jan Jessewitsch
e4dcac4a4b
Merging multiple docs into one ( #5032 )
...
* Add static method to Doc to allow merging of multiple docs.
* Add error description for the error that occurs if docs with different
vocabs (from different languages) are merged in Doc.from_docs().
* Add test for Doc.from_docs() implementation.
* Fix using numpy's concatenate in Doc.from_docs.
* Replace typing's type annotations in from_docs.
* Simply remove type annotations in from_docs.
* Add documentation for Doc.from_docs to api.
* Simplify from_docs, its test and the api doc for codebase consistency.
* Fix merging of Doc objects that end with whitespaces (Achieved by simply not setting the SPACY attribute on whitespace tokens). Remove two unnecessary imports of attributes.
* Add merging of user data from Doc objects in from_docs. Add user data test case to corresponding test. Add applicable warning messages.
* Fix incorrect setting of tokens idx by using concatenated spaces (again). Add test case to corresponding test.
* Add MORPH to attrs
* Update warnings calls
* Remove out-dated error from merge
* Rename space_delimiter to ensure_whitespace
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-07-03 11:32:42 +02:00
Sofie Van Landeghem
41b65fd0f8
fix to pretrain script ( #5699 )
...
* fix to pretrain script
* remove unnecessary import
2020-07-02 21:48:01 +02:00
Adriane Boyd
a723fa02a1
DocBin: add version number, missing attributes and strings ( #5685 )
...
* Add version number to DocBin
Add a version number to DocBin for future use.
* Add POS to all attributes in DocBin
* Add morph string to strings in DocBin
* Update DocBin API
* Add string for ENT_KB_ID in DocBin
2020-07-02 17:41:50 +02:00
Ines Montani
d36632553a
Merge pull request #5688 from explosion/remove-deprecated
...
Remove deprecated methods: Doc.print_tree, Doc.merge, Span.merge
2020-07-02 15:10:30 +02:00
Ines Montani
8a5b9a6d5f
Merge pull request #5693 from svlandeg/bugfix/nel-v3
2020-07-02 14:45:46 +02:00
Ines Montani
ee8a830248
Merge pull request #5687 from svlandeg/bugfix/init-model
...
Fixing init_model
2020-07-02 14:10:28 +02:00
svlandeg
04ed4d60a8
raise error when links are not aligned to tokens
2020-07-02 13:57:35 +02:00
svlandeg
f503817623
fix parsing entity links in new gold format
2020-07-02 13:48:11 +02:00
Ines Montani
60c2695131
Remove deprecated methods
2020-07-01 22:33:39 +02:00
Ines Montani
fe4cfd0632
Start updating website for v3 [ci skip]
2020-07-01 21:26:39 +02:00
svlandeg
a30bc77415
bugfixing prune_vectors and vectors_loc
2020-07-01 21:00:47 +02:00
Matthw Honnibal
94a0cf46fd
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 18:45:45 +02:00
Matthw Honnibal
6a0a27e5c2
Fix max_steps
2020-07-01 18:08:14 +02:00
Ines Montani
8d90e44d74
Fix title
2020-07-01 15:38:01 +02:00
Ines Montani
8fb574900a
Update parent package and version
2020-07-01 15:35:23 +02:00
Matthew Honnibal
0ada186dda
Set version to v3.0.0.dev14
2020-07-01 15:31:04 +02:00
Matthw Honnibal
cb51bb637b
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 15:17:27 +02:00
Matthw Honnibal
7734cbc34d
Set batch size in begin_training
2020-07-01 15:16:59 +02:00
Matthw Honnibal
1f7709e9a6
Improve max length check in corpus
2020-07-01 15:16:43 +02:00
Matthw Honnibal
2fa56484b2
Fix eval batch size
2020-07-01 15:16:25 +02:00
Matthw Honnibal
c5d12d1a22
Allow batch size to be set for evaluation in spacy train
2020-07-01 15:04:36 +02:00
Matthw Honnibal
f5532757a3
Filter out 0-length examples in Corpus
2020-07-01 15:02:37 +02:00
Ines Montani
bc87ba97e0
Merge pull request #5681 from svlandeg/bugfix/exec-cwd
2020-07-01 14:13:19 +02:00
Matthw Honnibal
52338a07bb
Set version to v3.0.0.dev13
2020-07-01 02:49:17 +02:00
Matthw Honnibal
fa6d473390
Fix parser maxout_pieces=1
2020-07-01 02:48:58 +02:00
Matthw Honnibal
35af5819e0
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-01 01:03:39 +02:00
Matthw Honnibal
0d6edf5397
Clean up debug code in transition_system
2020-07-01 01:03:20 +02:00
Matthw Honnibal
a1b6add4c8
Fix parser gold cutting and gradient normalization
2020-07-01 01:02:58 +02:00
Matthw Honnibal
8c5a88e777
Fix per-epoch shuffling
2020-07-01 01:02:35 +02:00
svlandeg
a7d547c65e
small fix
2020-06-30 21:56:17 +02:00
svlandeg
8eca7e995e
add try-except to git commands to get an informative warning
2020-06-30 21:53:40 +02:00
Ines Montani
b032943c34
Fix funny printing again
2020-06-30 21:33:41 +02:00
Matthw Honnibal
d525552979
Fix efficiency of parser backprop_nonlinearity
2020-06-30 21:22:54 +02:00
Ines Montani
d64644d9d1
Adjust auto-formatting
2020-06-30 20:36:30 +02:00
Ines Montani
6da3500728
Fix command substitution
2020-06-30 20:35:51 +02:00
svlandeg
e7aff9c5fc
bugfix exec usage in dvc.yaml
2020-06-30 18:51:20 +02:00
svlandeg
60f97bc519
add custom warning when run_command fails
2020-06-30 17:28:43 +02:00
svlandeg
39953c7c60
fix print_run_help with new arg order
2020-06-30 17:28:09 +02:00
svlandeg
cd632d8ec2
move folder for exec argument one up
2020-06-30 17:19:36 +02:00
svlandeg
1ae6fa2554
move subcommand one place up as project_dir has default
2020-06-30 16:04:53 +02:00
svlandeg
a46b76f188
use current working dir as default throughout
2020-06-30 15:39:24 +02:00
svlandeg
b228111925
fix funny printing
2020-06-30 14:54:45 +02:00
Ines Montani
8e20505970
Resolve within working_dir context manager
2020-06-30 13:29:45 +02:00
Ines Montani
72175b5c60
Update project command
2020-06-30 13:17:26 +02:00
Ines Montani
c5e31acb06
Make working_dir yield absolute cwd path
2020-06-30 13:17:14 +02:00
Ines Montani
3aca404735
Make run_command take string and list
2020-06-30 13:17:00 +02:00
Ines Montani
7584fdafec
Fix typo
2020-06-30 12:59:13 +02:00
svlandeg
140c4896a0
split_command util function
2020-06-30 12:54:15 +02:00
Matthw Honnibal
57e09747dc
Improve efficiency of get_oracle_sequences
2020-06-30 11:50:48 +02:00
Matthw Honnibal
233945bfe0
Fix init for padding
2020-06-30 11:50:24 +02:00
svlandeg
d23be563eb
remove redundant setting of no_args_is_help
2020-06-30 11:23:35 +02:00
svlandeg
b311ce982f
Merge remote-tracking branch 'upstream/develop' into fix/small-edits
...
# Conflicts:
# spacy/cli/project.py
2020-06-30 11:17:31 +02:00
svlandeg
7e4cbda89a
fix project_init for relative path
2020-06-30 11:09:53 +02:00
Matthw Honnibal
85ed5730a2
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-06-30 01:14:16 +02:00
Ines Montani
e8033df81e
Also handle python3 and pip3
2020-06-29 20:30:42 +02:00
Ines Montani
c874dde66c
Show help on "spacy project"
2020-06-29 20:11:34 +02:00
Ines Montani
1d2c646e57
Fix init and remove .dvc/plots
2020-06-29 20:07:21 +02:00
Matthw Honnibal
5bed6fc431
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-06-29 19:55:24 +02:00