Commit Graph

10106 Commits

Author SHA1 Message Date
Matthew Honnibal
1612990e88 Implement cosine loss for spacy pretrain. Make default 2019-03-20 11:06:58 +00:00
Ines Montani
ae5b4d0e84 Fix formatting (hopefully also restarts build properly) 2019-03-20 09:55:45 +01:00
Ines Montani
6abc1ddb26 Update __main__.py 2019-03-20 09:43:26 +01:00
Bharat123Rox
f2547f02d6 Made changes suggested by @ines 2019-03-20 07:43:19 +05:30
Ines Montani
7400c7f8a7 Move UD scripts to bin 2019-03-20 01:19:34 +01:00
Ines Montani
685fff40cf Revert "Add --always-link flag to cli.download (see #3435)"
This reverts commit 583a566843.
2019-03-20 01:03:40 +01:00
Matthew Honnibal
6cfbb2d34e Merge branch 'master' of https://github.com/explosion/spaCy 2019-03-20 00:59:54 +01:00
Matthew Honnibal
5a53e9358a Set version to 2.1.1 2019-03-20 00:59:45 +01:00
Matthew Honnibal
02d7b41893 Fix GPU installation. Closes #3437 2019-03-20 00:59:27 +01:00
Ines Montani
583a566843 Add --always-link flag to cli.download (see #3435) 2019-03-19 22:03:27 +01:00
svlandeg
b7ca3de358 check the length of entities and probabilities vector + unit test 2019-03-19 21:55:10 +01:00
svlandeg
7402bb4c06 correct size, not counting dummy elements in the vector 2019-03-19 21:50:32 +01:00
svlandeg
f0decf98f1 check and unit test in case prior probs exceed 1 2019-03-19 21:43:48 +01:00
svlandeg
2f2f821648 avoid value 0 in preshmap and helpful user warnings 2019-03-19 21:35:24 +01:00
Bharat123Rox
b5f077dcf4 Sign the Contributor Agreement and update details 2019-03-19 23:07:54 +05:30
Bharat123Rox
6db1ddd9c7 Raise ValueError for narrow unicode build 2019-03-19 23:02:58 +05:30
svlandeg
19d3a2f9aa raising error when adding alias for unknown entity + unit test 2019-03-19 17:39:35 +01:00
svlandeg
1d20f19208 use StringStore 2019-03-19 16:43:23 +01:00
svlandeg
1fba7219fb bugfix adding aliases 2019-03-19 16:15:38 +01:00
svlandeg
c62cca3368 get candidates by alias 2019-03-19 15:51:56 +01:00
Ines Montani
1aff3ad770 Update netlify.toml 2019-03-19 14:49:35 +01:00
Ines Montani
f7b5ff7907 Move netlify.toml to root 2019-03-19 14:40:14 +01:00
Ines Montani
c6ee030721 Fix docsearch 2019-03-19 14:38:49 +01:00
Ines Montani
0155083e01 Update netlify.toml 2019-03-19 14:07:00 +01:00
Mehdi Hamoumi
9211f30ee3 Tiny correction in french lookup dictionary (#3427) 2019-03-19 13:00:19 +01:00
Ines Montani
d4eed4a84f Add note on unicode build to troubleshooting guide (see #3421) [ci skip] 2019-03-19 10:27:02 +01:00
Ines Montani
42d4b818e4 Redirect Netlify URL 2019-03-19 10:17:56 +01:00
Ines Montani
1ee97bc282 Add page title fallback, just in case 2019-03-18 18:58:55 +01:00
Ines Montani
728ae7651b Fix universe page titles if no separate title is set 2019-03-18 18:58:46 +01:00
svlandeg
a4d876d471 adding and retrieving aliases 2019-03-18 17:50:01 +01:00
svlandeg
a14fb54b17 very minimal KB functionality working 2019-03-18 17:27:51 +01:00
Ines Montani
a20d3772fd FIx responsive landing 2019-03-18 16:24:52 +01:00
Ines Montani
08284f3a11
💫 v2.1.0 launch updates (only merge on launch!) (#3414)
* Update README.md

* Use production docsearch [ci skip]

* Add option to exclude pages from search
2019-03-18 16:07:26 +01:00
svlandeg
5ac7edf53c adding aliases per entity in the KB 2019-03-18 12:38:40 +01:00
svlandeg
3945fd21b0 fix compile errors 2019-03-18 10:31:01 +01:00
Ines Montani
f0c1efcb00 Set version to 2.1.0 2019-03-17 22:42:58 +01:00
Matthew Honnibal
47e110375d Fix jsonl to json conversion (#3419)
* Fix spacy.gold.docs_to_json function

* Fix jsonl2json converter
2019-03-17 22:12:54 +01:00
Matthew Honnibal
0a4b074184 Improve beam search defaults 2019-03-17 21:47:45 +01:00
Ines Montani
226db621d0 Strip out .dev versions in spacy validate [ci skip] 2019-03-17 12:16:53 +01:00
Ines Montani
a611b32fbf Update model docs [ci skip] 2019-03-17 11:48:18 +01:00
Matthew Honnibal
c6be9964ec Set version to v2.1.0.dev1 2019-03-16 21:47:41 +01:00
Matthew Honnibal
61617c64d5 Revert changes to optimizer default hyper-params (WIP) (#3415)
While developing v2.1, I ran a bunch of hyper-parameter search
experiments to find settings that performed well for spaCy's NER and
parser. I ended up changing the default Adam settings from beta1=0.9,
beta2=0.999, eps=1e-8 to beta1=0.8, beta2=0.8, eps=1e-5. This was giving
a small improvement in accuracy (like, 0.4%).

Months later, I run the models with Prodigy, which uses beam-search
decoding even when the model has been trained with a greedy objective.
The new models performed terribly...So, wtf? After a couple of days
debugging, I figured out that the new optimizer settings was causing the
model to converge to solutions where the top-scoring class often had
a score of like, -80. The variance on the weights had gone up
enormously. I guess I needed to update the L2 regularisation as well?

Anyway. Let's just revert the change --- if the optimizer is finding
such extreme solutions, that seems bad, and not nearly worth the small
improvement in accuracy.

Currently training a slate of models, to verify the accuracy change is minimal.
Once the training is complete, we can merge this.

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:39:02 +01:00
Matthew Honnibal
62afa64a8d Expose batch size and length caps on CLI for pretrain (#3417)
Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`.

Also improve CLI output.

Closes #3216 

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:38:45 +01:00
Matthew Honnibal
58d562d9b0
Merge pull request #3416 from explosion/feature/improve-beam
Improve beam search support
2019-03-16 18:42:18 +01:00
Ines Montani
2c5dd4d602 Update Vectors.find docs [ci skip] 2019-03-16 17:10:57 +01:00
Ines Montani
0f8739c7cb Update train.py 2019-03-16 16:04:15 +01:00
Ines Montani
e7aa25d9b1 Fix beam width integration 2019-03-16 16:02:47 +01:00
Ines Montani
c94742ff64 Only add beam width if customised 2019-03-16 15:55:31 +01:00
Ines Montani
7a354761c7 Auto-format 2019-03-16 15:55:13 +01:00
Matthew Honnibal
daa8c3787a Add eval_beam_widths argument to spacy train 2019-03-16 15:02:39 +01:00