Ines Montani
8cb7f9ccff
Improve assets and DVC handling ( #5719 )
...
* Improve assets and DVC handling
* Remove outdated comment [ci skip]
2020-07-07 20:51:50 +02:00
Ines Montani
2298e129e6
Update example and training docs
2020-07-07 20:30:12 +02:00
svlandeg
2b60e894cb
fix component constructors, update, begin_training, reference to GoldParse
2020-07-07 19:17:19 +02:00
Sofie Van Landeghem
a39a110c4e
Few more Example unit tests ( #5720 )
...
* small fixes in Example, UX
* add gold tests for aligned_spans and get_aligned_parse
* sentencizer unnecessary
2020-07-07 18:46:00 +02:00
Matthw Honnibal
433dc3c9c9
Simplify PrecomputableAffine slightly
2020-07-07 17:22:47 +02:00
Matthw Honnibal
a4164f67ca
Don't normalize gradients
2020-07-07 17:21:58 +02:00
Matthw Honnibal
8177f25b6c
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-07 17:21:10 +02:00
svlandeg
14a796e3f9
add Example API with examples of Example usage
2020-07-07 14:46:41 +02:00
Ines Montani
fa00a85828
Merge pull request #5715 from explosion/chore/tidy-regression-tests
2020-07-07 11:22:07 +02:00
Matthw Honnibal
d1fd3438c3
Add dropout to parser hidden layer
2020-07-07 01:38:15 +02:00
Ines Montani
bb3ee38cf9
Update WIP
2020-07-06 22:22:37 +02:00
Ines Montani
44da24ddd0
Update doc.md
2020-07-06 18:17:00 +02:00
Ines Montani
44790c1c32
Update docs and add keyword-only tag
2020-07-06 18:14:57 +02:00
Matthw Honnibal
1eb1654941
Update configs
2020-07-06 17:51:37 +02:00
Matthw Honnibal
f25761e513
Dont randomize cuts in parser
2020-07-06 17:51:25 +02:00
Matthw Honnibal
709fc5e4ad
Clarify dropout and seed in Tok2Vec
2020-07-06 17:50:21 +02:00
Matthew Honnibal
19d42f42de
Set version to v3.0.0a2
2020-07-06 17:43:12 +02:00
Matthew Honnibal
cc477be952
Improve gold-standard alignment ( #5711 )
...
* Remove previous alignment
* Implement better alignment, using ragged data structure
* Use pytokenizations for alignment
* Fixes
* Fixes
* Fix overlapping entities in alignment
* Fix align split_sents
* Update test
* Commit align.py
* Try to appease setuptools
* Fix flake8
* use realistic entities for testing
* Update tests for better alignment
* Improve alignment heuristic
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2020-07-06 17:39:31 +02:00
Mike Izbicki
7a2ca00794
fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab ( #5701 )
...
* speed up Korean nlp 100x by stopping mecab from reloading on each doc
* add contributor agreement
* rename variables to improve code readability
2020-07-06 17:03:33 +02:00
Ines Montani
b6deef80f8
Fix class to pickling works as expected
2020-07-06 16:43:45 +02:00
Ines Montani
a35236e5f0
Update v3 docs WIP [ci skip]
2020-07-06 15:57:44 +02:00
Ines Montani
fa261d09e8
Add alternative CLI option
2020-07-06 15:57:38 +02:00
Adriane Boyd
c67fc6aa5b
Make docs_to_json
backwards-compatible with v2 ( #5714 )
...
* In `spacy convert -t json` output the JSON docs wrapped in a list
* Add back token-level `ner` alongside the doc-level `entities`
2020-07-06 14:15:00 +02:00
Ines Montani
5b7b2a498d
Tidy up and merge regression tests
2020-07-06 14:05:59 +02:00
Ines Montani
412dbb1f38
Remove dead and/or deprecated code ( #5710 )
...
* Remove dead and/or deprecated code
* Remove n_threads
Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2020-07-06 13:06:25 +02:00
Sofie Van Landeghem
fcbf899b08
Feature/example only ( #5707 )
...
* remove _convert_examples
* fix test_gold, raise TypeError if tuples are used instead of Example's
* throwing proper errors when the wrong type of objects are passed
* fix deprectated format in tests
* fix deprectated format in parser tests
* fix tests for NEL, morph, senter, tagger, textcat
* update regression tests with new Example format
* use make_doc
* more fixes to nlp.update calls
* few more small fixes for rehearse and evaluate
* only import ml_datasets if really necessary
2020-07-06 13:02:36 +02:00
Ines Montani
63247cbe87
Update v3 docs [ci skip]
2020-07-05 16:11:16 +02:00
graue70
9860b8399e
Fix typo in test function docstring ( #5696 )
2020-07-05 15:49:06 +02:00
Matthew Honnibal
3e78e82a83
Experimental character-based pretraining ( #5700 )
...
* Use cosine loss in Cloze multitask
* Fix char_embed for gpu
* Call resume_training for base model in train CLI
* Fix bilstm_depth default in pretrain command
* Implement character-based pretraining objective
* Use chars loss in ClozeMultitask
* Add method to decode predicted characters
* Fix number characters
* Rescale gradients for mlm
* Fix char embed+vectors in ml
* Fix pipes
* Fix pretrain args
* Move get_characters_loss
* Fix import
* Fix import
* Mention characters loss option in pretrain
* Remove broken 'self attention' option in pretrain
* Revert "Remove broken 'self attention' option in pretrain"
This reverts commit 56b820f6af
.
* Document 'characters' objective of pretrain
2020-07-05 15:48:39 +02:00
Matthw Honnibal
3f6f087113
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-04 23:52:12 +02:00
Matthw Honnibal
5642507823
Fix has_unknown_spaces in Doc.copy
2020-07-04 23:52:02 +02:00
Matthw Honnibal
8870a6ded7
Specify seeds in HashEmbed
2020-07-04 23:51:49 +02:00
Ines Montani
dc8c9d912f
Update docs [ci skip]
2020-07-04 16:47:24 +02:00
Ines Montani
37c3bb35e2
Auto-format
2020-07-04 16:25:34 +02:00
Ines Montani
4498dfe99d
Update docs
2020-07-04 16:25:30 +02:00
Ines Montani
2d9ca0cd8b
Make Thinc version consistent
2020-07-04 14:39:34 +02:00
Ines Montani
6a5095621a
Merge branch 'nightly.spacy.io' into develop
2020-07-04 14:23:55 +02:00
Ines Montani
abd173937f
Auto-format and update URL
2020-07-04 14:23:44 +02:00
Ines Montani
99aff16d60
Make argument shortcut consistent
2020-07-04 14:23:32 +02:00
Ines Montani
1e0d54edd1
Update docs
2020-07-04 14:23:10 +02:00
Matthew Honnibal
2bd1bf81f1
Refactor pretrain and support character-based objective for v3 ( #5706 )
...
* Start adding character-based stuff
* Start adding character-based objective
* Start adding character-based stuff
* Start adding character-based objective
* Remove outdated comment
* Update pretraining models
* Add/fix character-based multi-task models
* Refactor pretrain and support character-based objective
* Update pretrain config
* Remove unused
* Fix flake8 errors
* Clean up imports
* Format
* Format
* Update Thinc version
* Raise error if vectors objective but no vectors
2020-07-03 17:57:28 +02:00
Ines Montani
fe224dc2dd
Merge branch 'develop' into nightly.spacy.io
2020-07-03 16:48:27 +02:00
Ines Montani
06f1ecb308
Update v3 docs
2020-07-03 16:48:21 +02:00
Ines Montani
cdf9ee1716
Add stub for Example API docs [ci skip]
2020-07-03 15:46:10 +02:00
Ines Montani
fa8e097c04
Update convert docs [ci skip]
2020-07-03 15:42:04 +02:00
Ines Montani
84fb3a3fb3
Auto-format and fix tuple
2020-07-03 15:20:10 +02:00
Ines Montani
949d4a0a0b
Merge branch 'develop' into nightly.spacy.io
2020-07-03 15:15:58 +02:00
Adriane Boyd
86d13a9fb8
Set version to 2.3.1 ( #5705 )
2020-07-03 13:38:41 +02:00
Matthew Honnibal
e1b3e8ee11
Set version to v3.0.0a1
2020-07-03 13:21:08 +02:00
Matthew Honnibal
a902b5f217
Record whether Doc objects are built from known spacing ( #5697 )
...
* Tell convert CLI to store user data for Doc
* Remove assert
* Add has_unknwon_spaces flag on Doc
* Do not tokenize docs with unknown spaces in Corpus
* Handle conversion of unknown spaces in Example
* Fixes
* Fixes
* Draft has_known_spaces support in DocBin
* Add test for serialize has_unknown_spaces
* Fix DocBin serialization when has_unknown_spaces
* Use serialization in test
2020-07-03 12:58:16 +02:00