Commit Graph

9973 Commits

Author SHA1 Message Date
svlandeg
5ac7edf53c adding aliases per entity in the KB 2019-03-18 12:38:40 +01:00
svlandeg
3945fd21b0 fix compile errors 2019-03-18 10:31:01 +01:00
Ines Montani
f0c1efcb00 Set version to 2.1.0 2019-03-17 22:42:58 +01:00
Matthew Honnibal
47e110375d Fix jsonl to json conversion (#3419)
* Fix spacy.gold.docs_to_json function

* Fix jsonl2json converter
2019-03-17 22:12:54 +01:00
Matthew Honnibal
0a4b074184 Improve beam search defaults 2019-03-17 21:47:45 +01:00
Ines Montani
226db621d0 Strip out .dev versions in spacy validate [ci skip] 2019-03-17 12:16:53 +01:00
Ines Montani
a611b32fbf Update model docs [ci skip] 2019-03-17 11:48:18 +01:00
Matthew Honnibal
c6be9964ec Set version to v2.1.0.dev1 2019-03-16 21:47:41 +01:00
Matthew Honnibal
61617c64d5 Revert changes to optimizer default hyper-params (WIP) (#3415)
While developing v2.1, I ran a bunch of hyper-parameter search
experiments to find settings that performed well for spaCy's NER and
parser. I ended up changing the default Adam settings from beta1=0.9,
beta2=0.999, eps=1e-8 to beta1=0.8, beta2=0.8, eps=1e-5. This was giving
a small improvement in accuracy (like, 0.4%).

Months later, I run the models with Prodigy, which uses beam-search
decoding even when the model has been trained with a greedy objective.
The new models performed terribly...So, wtf? After a couple of days
debugging, I figured out that the new optimizer settings was causing the
model to converge to solutions where the top-scoring class often had
a score of like, -80. The variance on the weights had gone up
enormously. I guess I needed to update the L2 regularisation as well?

Anyway. Let's just revert the change --- if the optimizer is finding
such extreme solutions, that seems bad, and not nearly worth the small
improvement in accuracy.

Currently training a slate of models, to verify the accuracy change is minimal.
Once the training is complete, we can merge this.

<!--- Provide a general summary of your changes in the title. -->

## Description
<!--- Use this section to describe your changes. If your changes required
testing, include information about the testing environment and the tests you
ran. If your test fixes a bug reported in an issue, don't forget to include the
issue number. If your PR is still a work in progress, that's totally fine – just
include a note to let us know. -->

### Types of change
<!-- What type of change does your PR cover? Is it a bug fix, an enhancement
or new feature, or a change to the documentation? -->

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:39:02 +01:00
Matthew Honnibal
62afa64a8d Expose batch size and length caps on CLI for pretrain (#3417)
Add and document CLI options for batch size, max doc length, min doc length for `spacy pretrain`.

Also improve CLI output.

Closes #3216 

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-16 21:38:45 +01:00
Matthew Honnibal
58d562d9b0
Merge pull request #3416 from explosion/feature/improve-beam
Improve beam search support
2019-03-16 18:42:18 +01:00
Ines Montani
2c5dd4d602 Update Vectors.find docs [ci skip] 2019-03-16 17:10:57 +01:00
Ines Montani
0f8739c7cb Update train.py 2019-03-16 16:04:15 +01:00
Ines Montani
e7aa25d9b1 Fix beam width integration 2019-03-16 16:02:47 +01:00
Ines Montani
c94742ff64 Only add beam width if customised 2019-03-16 15:55:31 +01:00
Ines Montani
7a354761c7 Auto-format 2019-03-16 15:55:13 +01:00
Matthew Honnibal
daa8c3787a Add eval_beam_widths argument to spacy train 2019-03-16 15:02:39 +01:00
Ines Montani
2eecd756fa Update package name 2019-03-16 14:43:53 +01:00
Ines Montani
399987c216 Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
Ines Montani
f55a52a2dd Set version to v2.1.0.dev0 2019-03-16 13:47:03 +01:00
Ines Montani
f6ffbe1fd3 Fix filename 2019-03-16 13:46:58 +01:00
Ines Montani
fb53eb570f Fix typo 2019-03-16 13:45:46 +01:00
Ines Montani
146dc2766a Update README.md 2019-03-16 13:34:23 +01:00
Ines Montani
68454246a8 Update README.md 2019-03-16 13:34:01 +01:00
Ines Montani
b515a3efbe Update requirements.txt 2019-03-16 13:33:55 +01:00
Matthew Honnibal
9a34d38829
Merge pull request #3413 from explosion/develop
💫 Merge develop (v2.1) into master
2019-03-16 13:33:01 +01:00
Ines Montani
dc933110d4 Merge branch 'spacy.io' into develop 2019-03-15 18:17:12 +01:00
Ryan Ford
00842d7f1b Merging conversion scripts for conll formats (#3405)
* merging conllu/conll and conllubio scripts

* tabs to spaces

* removing conllubio2json from converters/__init__.py

* Move not-really-CLI tests to misc

* Add converter test using no-ud data

* Fix test I broke

* removing include_biluo parameter

* fixing read_conllx

* remove include_biluo from convert.py
2019-03-15 18:14:46 +01:00
Ines Montani
bec8db91e6 Add actual deprecation warning for n_threads (resolves #3410) 2019-03-15 16:38:44 +01:00
Ines Montani
cb5dbfa63a Tidy up references to n_threads and fix default 2019-03-15 16:24:26 +01:00
Ines Montani
852e1f105c Tidy up docstrings 2019-03-15 16:23:17 +01:00
svlandeg
56b55e3bcd add pyx and separate method to add aliases 2019-03-15 16:05:23 +01:00
Matthew Honnibal
b13b2aeb54 Use hash_state in beam 2019-03-15 15:22:58 +01:00
Matthew Honnibal
693c8934e8 Normalize over all actions in parser, not just valid ones 2019-03-15 15:22:16 +01:00
Matthew Honnibal
b94b2b1168 Export hash_state from beam_utils 2019-03-15 15:20:28 +01:00
Matthew Honnibal
ad56641324 Fix Language.evaluate 2019-03-15 15:20:09 +01:00
Matthew Honnibal
f762c36e61 Evaluate accuracy at multiple beam widths 2019-03-15 15:19:49 +01:00
svlandeg
dc603fb85e hash the entity name 2019-03-15 15:00:53 +01:00
Ines Montani
fa0f501165 Use dev DocSearch index 2019-03-15 14:48:38 +01:00
Ines Montani
8af7d01382 Fix general-purpose IDs 2019-03-15 14:48:26 +01:00
svlandeg
b6bac49444 documented some comments and todos 2019-03-15 11:37:24 +01:00
svlandeg
097e5f3da1 kb snippet, draft by Matt (wip) 2019-03-15 11:17:35 +01:00
Matthew Honnibal
0703f5986b Remove hack from beam 2019-03-15 00:48:39 +01:00
Sofie
c45ed32c74 label in span not writable anymore (#3408)
* label in span not writable anymore

* more explicit unit test and error message for readonly label

* bit more explanation (view)

* error msg tailored to specific case

* fix None case
2019-03-15 00:46:45 +01:00
Ines Montani
cbcba699dd Fix missing ids 2019-03-14 17:56:53 +01:00
Ines Montani
cffe63ea24 Fix :target padding for ids 2019-03-14 17:41:02 +01:00
Ines Montani
51b7b88acf Generate active sidebar heading (h0) at compile time 2019-03-14 17:20:51 +01:00
Ines Montani
4ab1871a75 Add search-exclude classes 2019-03-14 16:51:29 +01:00
Ines Montani
59bbf85986 Add id to body 2019-03-14 16:51:18 +01:00
svlandeg
5f002e9ced annotate kb_id through ents in doc 2019-03-14 16:31:46 +01:00