spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-28 02:46:35 +03:00

Author	SHA1	Message	Date
Ines Montani	24f72c669c	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
adrianeboyd	70da1fd2d6	Add warning for misaligned character offset spans (#5007 ) * Add warning for misaligned character offset spans * Resolve conflict * Filter warnings in example scripts Filter warnings in example scripts to show warnings once, in particular warnings about misaligned entities. Co-authored-by: Ines Montani <ines@ines.io>	2020-05-19 16:01:18 +02:00
Sofie Van Landeghem	0d94737857	Feature toggle_pipes (#5378 ) * make disable_pipes deprecated in favour of the new toggle_pipes * rewrite disable_pipes statements * update documentation * remove bin/wiki_entity_linking folder * one more fix * remove deprecated link to documentation * few more doc fixes * add note about name change to the docs * restore original disable_pipes * small fixes * fix typo * fix error number to W096 * rename to select_pipes * also make changes to the documentation Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-05-18 22:27:10 +02:00
Matthew Honnibal	333b1a308b	Adapt parser and NER for transformers (#5449 ) * Draft layer for BILUO actions * Fixes to biluo layer * WIP on BILUO layer * Add tests for BILUO layer * Format * Fix transitions * Update test * Link in the simple_ner * Update BILUO tagger * Update __init__ * Import simple_ner * Update test * Import * Add files * Add config * Fix label passing for BILUO and tagger * Fix label handling for simple_ner component * Update simple NER test * Update config * Hack train script * Update BILUO layer * Fix SimpleNER component * Update train_from_config * Add biluo_to_iob helper * Add IOB layer * Add IOBTagger model * Update biluo layer * Update SimpleNER tagger * Update BILUO * Read random seed in train-from-config * Update use of normal_init * Fix normalization of gradient in SimpleNER * Update IOBTagger * Remove print * Tweak masking in BILUO * Add dropout in SimpleNER * Update thinc * Tidy up simple_ner * Fix biluo model * Unhack train-from-config * Update setup.cfg and requirements * Add tb_framework.py for parser model * Try to avoid memory leak in BILUO * Move ParserModel into spacy.ml, avoid need for subclass. * Use updated parser model * Remove incorrect call to model.initializre in PrecomputableAffine * Update parser model * Avoid divide by zero in tagger * Add extra dropout layer in tagger * Refine minibatch_by_words function to avoid oom * Fix parser model after refactor * Try to avoid div-by-zero in SimpleNER * Fix infinite loop in minibatch_by_words * Use SequenceCategoricalCrossentropy in Tagger * Fix parser model when hidden layer * Remove extra dropout from tagger * Add extra nan check in tagger * Fix thinc version * Update tests and imports * Fix test * Update test * Update tests * Fix tests * Fix test Co-authored-by: Ines Montani <ines@ines.io>	2020-05-18 22:23:33 +02:00
Ines Montani	de11ea753a	Merge branch 'master' into develop	2020-02-18 14:47:23 +01:00
Sofie Van Landeghem	fbfc418745	run normal textcat train script with transformers (#4834 ) * keep trf tok2vec and wordpiecer components during update * also support transformer models for other example scripts	2020-01-16 02:01:23 +01:00
Sofie Van Landeghem	e48a09df4e	Example class for training data (#4543 ) * OrigAnnot class instead of gold.orig_annot list of zipped tuples * from_orig to replace from_annot_tuples * rename to RawAnnot * some unit tests for GoldParse creation and internal format * removing orig_annot and switching to lists instead of tuple * rewriting tuples to use RawAnnot (+ debug statements, WIP) * fix pop() changing the data * small fixes * pop-append fixes * return RawAnnot for existing GoldParse to have uniform interface * clean up imports * fix merge_sents * add unit test for 4402 with new structure (not working yet) * introduce DocAnnot * typo fixes * add unit test for merge_sents * rename from_orig to from_raw * fixing unit tests * fix nn parser * read_annots to produce text, doc_annot pairs * _make_golds fix * rename golds_to_gold_annots * small fixes * fix encoding * have golds_to_gold_annots use DocAnnot * missed a spot * merge_sents as function in DocAnnot * allow specifying only part of the token-level annotations * refactor with Example class + underlying dicts * pipeline components to work with Example objects (wip) * input checking * fix yielding * fix calls to update * small fixes * fix scorer unit test with new format * fix kwargs order * fixes for ud and conllu scripts * fix reading data for conllu script * add in proper errors (not fixed numbering yet to avoid merge conflicts) * fixing few more small bugs * fix EL script	2019-11-11 17:35:27 +01:00
Ines Montani	399987c216	Test and update examples [ci skip]	2019-03-16 14:15:49 +01:00
Ines Montani	c9a89bba50	Don't call begin_training if updating new model (see #3059 ) [ci skip]	2018-12-17 13:45:28 +01:00
Ines Montani	6f1438b5d9	Auto-format example	2018-12-17 13:44:38 +01:00
Ines Montani	4cd9ec0f00	💫 Update training examples and use minibatching (#2830 ) <!--- Provide a general summary of your changes in the title. --> ## Description Update the training examples in `/examples/training` to show usage of spaCy's `minibatch` and `compounding` helpers ([see here](https://spacy.io/usage/training#tips-batch-size) for details). The lack of batching in the examples has caused some confusion in the past, especially for beginners who would copy-paste the examples, update them with large training sets and experienced slow and unsatisfying results. ### Types of change enhancements ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-10 01:40:29 +02:00
ines	a09c096d3c	Get docs ready for v2.0.0	2017-11-07 12:00:43 +01:00
ines	173b1551af	Update examples	2017-11-07 01:22:30 +01:00
ines	1b1c9105b4	Update example compatibility statements	2017-11-07 01:11:45 +01:00
ines	fe498b3d5e	Update training examples to use "simple style"	2017-11-06 23:14:04 +01:00
ines	4b196fdf7f	Fix formatting	2017-11-01 00:43:22 +01:00
ines	f81cc0bd1c	Fix usage of disable_pipes	2017-10-27 00:31:30 +02:00
ines	bc2c92f22d	Use plac annotations for arguments	2017-10-26 16:10:56 +02:00
ines	d425ede7e9	Fix example	2017-10-26 15:15:08 +02:00
ines	9d58673aaf	Update train_ner example for spaCy v2.0	2017-10-26 14:24:12 +02:00
ines	992559bf9a	Fix formatting and remove unused imports	2017-06-01 12:47:18 +02:00
Matthew Honnibal	5c30466c95	Update NER training example	2017-05-31 13:42:12 +02:00
Matthew Honnibal	ab70f6e18d	Update NER training example	2017-01-27 12:27:10 +01:00
Christos Savvopoulos	ad54a929f8	train_ner should save vocab; add load_ner example	2016-12-12 20:09:49 +00:00
kendricktan	ba8841234a	Fixed training examples Changes: 1. train_ner won't crash if no data directory is not found 2. Fixed train_tagger expected spacy.gold.GoldParse, got list	2016-10-24 16:09:23 +10:00
kendricktan	9877f3298f	updated training examples to v1.1.2	2016-10-24 11:53:33 +10:00
kendricktan	d817d57219	Fixed train_ner examples when model_dir isn't None	2016-10-20 21:09:07 +10:00
Matthew Honnibal	f787cd29fe	Refactor the pipeline classes to make them more consistent, and remove the redundant blank() constructor.	2016-10-16 21:34:57 +02:00

28 Commits