Matthew Honnibal
c35d6282fc
Add previous HashEmbedCNN tok2vec to make transition easier
2020-07-29 14:01:12 +02:00
Matthew Honnibal
1784c95827
Clean up link_vectors_to_models unused stuff
2020-07-29 14:01:11 +02:00
Matthew Honnibal
0c17ea4c85
Format
2020-07-29 14:00:13 +02:00
Matthew Honnibal
2aff3c4b5a
Load vectors in 'spacy train'
2020-07-29 14:00:13 +02:00
Matthew Honnibal
7852a68a75
Fix load_vectors_into_model function
2020-07-29 14:00:13 +02:00
Matthew Honnibal
7299419fe4
Dont load vectors in Language.from_config
2020-07-29 14:00:12 +02:00
Matthew Honnibal
30dd96c540
Load vectors in Language.from_config
2020-07-29 14:00:12 +02:00
Matthew Honnibal
df95e2af64
Add load_vectors_into_model util
2020-07-29 14:00:12 +02:00
Matthew Honnibal
475d7c1c7c
Fix StaticVectors class
2020-07-29 14:00:11 +02:00
Matthew Honnibal
44d350dc94
Use spaCy's StaticVectors
2020-07-29 14:00:11 +02:00
Matthew Honnibal
acc64e138a
Add import
2020-07-29 14:00:11 +02:00
Matthew Honnibal
9987ea9e4d
Fix Tok2Vec begin_training
2020-07-29 14:00:10 +02:00
Matthew Honnibal
099e9331c5
Fix tok2vec
2020-07-29 14:00:10 +02:00
Matthew Honnibal
fe0cdcd461
Fixes
2020-07-29 14:00:09 +02:00
Matthew Honnibal
123f8b832d
Refactor Tok2Vec model
2020-07-29 14:00:09 +02:00
Matthew Honnibal
c6b4f63c7c
Remove obsolete function
2020-07-29 14:00:09 +02:00
Matthew Honnibal
9cc7262224
Draft StaticVectors layer
2020-07-29 14:00:09 +02:00
Matthew Honnibal
cb9654e98c
WIP on new StaticVectors
2020-07-29 14:00:09 +02:00
Ines Montani
e257e66ab9
Merge branch 'develop' of https://github.com/explosion/spaCy into develop
2020-07-29 11:36:45 +02:00
Ines Montani
e0ffe36e79
Update docstrings, docs and types
2020-07-29 11:36:42 +02:00
Sofie Van Landeghem
40c995b1be
Option for returning only greedy matches ( #5771 )
...
* add "greedy" option for match pattern
* distinction between greedy FIRST or LONGEST
* check for proper values, throw custom warning otherwise
* unxfail one more test
* add comment in docstring
* add test that LONGEST also prefers first match if equal length
* use c arrays for more efficient processing
* rename 'greediness' to 'greedy'
2020-07-29 11:04:43 +02:00
Adriane Boyd
191a12d75f
Fix score_weights typo in train CLI ( #5835 )
2020-07-29 11:04:12 +02:00
Adriane Boyd
0cddb0dbe9
Move timing into Language.evaluate ( #5836 )
...
Move timing into `Language.evaluate` so that only the processing is
timing, not processing + scoring. `Language.evaluate` returns
`scores["speed"]` as words per second, which should be identical to how
the speed was added to the scores previously. Also add the speed to the
evaluate CLI output.
2020-07-29 11:02:31 +02:00
Ines Montani
7adffc5361
Remove unused schema
2020-07-28 23:12:47 +02:00
Ines Montani
e5d9eaf79c
Tidy up docstrings and arguments
2020-07-28 23:12:42 +02:00
Ines Montani
2c7a32cf12
Remove unused methods
2020-07-28 16:50:02 +02:00
Ines Montani
ba22111ff4
Move error to Errors
2020-07-28 16:24:14 +02:00
Ines Montani
2748249217
Re-add meta["pipeline"] for now
2020-07-28 16:14:23 +02:00
Ines Montani
b83ead5bf5
Merge pull request #5824 from svlandeg/fix/textcat-v3
2020-07-28 15:04:25 +02:00
Ines Montani
06a97a8766
Support --opt=value format in CLI config overrides
2020-07-28 13:43:15 +02:00
Ines Montani
ae4d8a6ffd
Update docstrings, docs and pipe consistency
2020-07-28 13:37:31 +02:00
Ines Montani
0094cb0d04
Remove scores list from config and document
2020-07-28 11:22:24 +02:00
Ines Montani
894e20c466
Merge branch 'develop' into feature/component-scores
2020-07-27 18:14:39 +02:00
Ines Montani
d8b519c23c
API docs, docstrings and argument consistency
2020-07-27 18:11:45 +02:00
svlandeg
85b2dcfd67
cleanup
2020-07-27 17:54:44 +02:00
svlandeg
61068e0fb1
util function dot_to_object and corresponding unit test
2020-07-27 17:50:12 +02:00
Ines Montani
10b84e1e27
Add flag to toggle sdist creation on package [ci skip]
2020-07-27 16:52:23 +02:00
Adriane Boyd
34c92dfe63
Add missing Scorer imports
2020-07-27 15:08:51 +02:00
Adriane Boyd
8bb0507777
Add and update score methods and score weights
...
Add and update `score` methods, provided `scores`, and default weights
`default_score_weights` for pipeline components.
* `scores` provides all top-level keys returned by `score` (merely informative, similar to `assigns`).
* `default_score_weights` provides the default weights for a default config.
* The keys from `default_score_weights` determine which values will be
shown in the `spacy train` output, so keys with weight `0.0` will be
displayed but not counted toward the overall score.
2020-07-27 14:44:53 +02:00
Adriane Boyd
baf19fd652
Update cats scoring to provide overall score
...
* Provide top-level score as `attr_score`
* Provide a description of the score as `attr_score_desc`
* Provide all potential scores keys, setting unused keys to `None`
* Update CLI evaluate accordingly
2020-07-27 12:26:10 +02:00
Adriane Boyd
f8cf378be9
Combine weights from multiple components
...
Combine weights from multiple components for the same score.
2020-07-27 10:21:31 +02:00
Ines Montani
3d56a3f286
Make more args keyword-only
2020-07-27 00:27:53 +02:00
Matthew Honnibal
80271ac0ba
Update default config
2020-07-26 15:27:39 +02:00
Ines Montani
ed61fb10fc
Rename default textcat arch to TextCatEnsemble
2020-07-26 15:11:43 +02:00
Ines Montani
53d37da29a
Make sure @factories is removed from config
2020-07-26 15:11:24 +02:00
Ines Montani
4060c2d5a6
Fix test
2020-07-26 13:40:19 +02:00
Ines Montani
2470486543
Allow pipeline components to set default scores and weights
2020-07-26 13:18:43 +02:00
Ines Montani
787d066e22
Remove pipes.pyx
...
Probably accidentally re-added in a merge?
2020-07-26 13:08:52 +02:00
Matthew Honnibal
520d25cb50
Add smart_open dependency to fetch project assets ( #5812 )
...
* Use smart_open for project assets
* Fix assets.py
* Update pyproject.toml
2020-07-26 12:15:00 +02:00
Ines Montani
e92df281ce
Tidy up, autoformat, add types
2020-07-25 15:01:15 +02:00