Commit Graph

22 Commits

Author SHA1 Message Date
Sofie Van Landeghem
e48a09df4e Example class for training data (#4543)
* OrigAnnot class instead of gold.orig_annot list of zipped tuples

* from_orig to replace from_annot_tuples

* rename to RawAnnot

* some unit tests for GoldParse creation and internal format

* removing orig_annot and switching to lists instead of tuple

* rewriting tuples to use RawAnnot (+ debug statements, WIP)

* fix pop() changing the data

* small fixes

* pop-append fixes

* return RawAnnot for existing GoldParse to have uniform interface

* clean up imports

* fix merge_sents

* add unit test for 4402 with new structure (not working yet)

* introduce DocAnnot

* typo fixes

* add unit test for merge_sents

* rename from_orig to from_raw

* fixing unit tests

* fix nn parser

* read_annots to produce text, doc_annot pairs

* _make_golds fix

* rename golds_to_gold_annots

* small fixes

* fix encoding

* have golds_to_gold_annots use DocAnnot

* missed a spot

* merge_sents as function in DocAnnot

* allow specifying only part of the token-level annotations

* refactor with Example class + underlying dicts

* pipeline components to work with Example objects (wip)

* input checking

* fix yielding

* fix calls to update

* small fixes

* fix scorer unit test with new format

* fix kwargs order

* fixes for ud and conllu scripts

* fix reading data for conllu script

* add in proper errors (not fixed numbering yet to avoid merge conflicts)

* fixing few more small bugs

* fix EL script
2019-11-11 17:35:27 +01:00
Ines Montani
399987c216 Test and update examples [ci skip] 2019-03-16 14:15:49 +01:00
Ines Montani
c9a89bba50 Don't call begin_training if updating new model (see #3059) [ci skip] 2018-12-17 13:45:28 +01:00
Ines Montani
6f1438b5d9 Auto-format example 2018-12-17 13:44:38 +01:00
Ines Montani
4cd9ec0f00
💫 Update training examples and use minibatching (#2830)
<!--- Provide a general summary of your changes in the title. -->

## Description
Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results.

### Types of change
enhancements

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-10-10 01:40:29 +02:00
ines
a09c096d3c Get docs ready for v2.0.0 2017-11-07 12:00:43 +01:00
ines
173b1551af Update examples 2017-11-07 01:22:30 +01:00
ines
1b1c9105b4 Update example compatibility statements 2017-11-07 01:11:45 +01:00
ines
fe498b3d5e Update training examples to use "simple style" 2017-11-06 23:14:04 +01:00
ines
4b196fdf7f Fix formatting 2017-11-01 00:43:22 +01:00
ines
f81cc0bd1c Fix usage of disable_pipes 2017-10-27 00:31:30 +02:00
ines
bc2c92f22d Use plac annotations for arguments 2017-10-26 16:10:56 +02:00
ines
d425ede7e9 Fix example 2017-10-26 15:15:08 +02:00
ines
9d58673aaf Update train_ner example for spaCy v2.0 2017-10-26 14:24:12 +02:00
ines
992559bf9a Fix formatting and remove unused imports 2017-06-01 12:47:18 +02:00
Matthew Honnibal
5c30466c95 Update NER training example 2017-05-31 13:42:12 +02:00
Matthew Honnibal
ab70f6e18d Update NER training example 2017-01-27 12:27:10 +01:00
Christos Savvopoulos
ad54a929f8 train_ner should save vocab; add load_ner example 2016-12-12 20:09:49 +00:00
kendricktan
ba8841234a Fixed training examples
Changes:
1. train_ner won't crash if no data directory is not found
2. Fixed train_tagger expected spacy.gold.GoldParse, got list
2016-10-24 16:09:23 +10:00
kendricktan
9877f3298f updated training examples to v1.1.2 2016-10-24 11:53:33 +10:00
kendricktan
d817d57219 Fixed train_ner examples when model_dir isn't None 2016-10-20 21:09:07 +10:00
Matthew Honnibal
f787cd29fe Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor. 2016-10-16 21:34:57 +02:00