spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-09-19 02:22:43 +03:00

Author	SHA1	Message	Date
svlandeg	5121972930	add types of Tok2Vec embedding layers	2020-10-01 09:20:09 +02:00
svlandeg	64d90039a1	encoding UTF8	2020-09-29 10:54:42 +02:00
Ines Montani	3fa30a7f2d	Merge pull request #6159 from svlandeg/fix/pydantic-pin upgrade pydantic pin	2020-09-28 18:18:02 +02:00
svlandeg	cd21eb2485	upgrade pydantic pin for thinc's field.default_factory	2020-09-28 16:45:48 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Ines Montani	cad4dbddaa	Merge pull request #6156 from explosion/feature/new-thinc-config-resolution	2020-09-27 23:57:52 +02:00
Ines Montani	9016d23cc5	Fix exclude and add test	2020-09-27 23:34:03 +02:00
Ines Montani	c0c842ae5b	Update Thinc version	2020-09-27 23:24:40 +02:00
Ines Montani	658fad428a	Fix base schema integration	2020-09-27 22:50:36 +02:00
Ines Montani	47c6a461e5	Revert except all in CLI error handling [ci skip]	2020-09-27 22:41:00 +02:00
Ines Montani	5c53a76021	Improve CLI error handling [ci skip]	2020-09-27 22:39:04 +02:00
Ines Montani	e04bd16f7f	Merge branch 'develop' into feature/new-thinc-config-resolution	2020-09-27 22:34:46 +02:00
Ines Montani	d7ad65a9bb	Fix handling of error description [ci skip]	2020-09-27 22:31:57 +02:00
Ines Montani	7e938ed63e	Update config resolution to use new Thinc	2020-09-27 22:21:31 +02:00
Adriane Boyd	013b66de05	Add tokenizer scoring to ja / ko / zh (#6152 )	2020-09-27 22:20:45 +02:00
Adriane Boyd	a6548ead17	Add _ as a symbol (#6153 ) * Add _ to StringStore in Morphology * Add _ as a symbol Add `_` as a symbol instead of adding to the `StringStore`.	2020-09-27 22:20:14 +02:00
Ines Montani	f29d5b9b89	Update docs [ci skip]	2020-09-27 18:39:38 +02:00
Ines Montani	3838b14148	Merge pull request #6151 from explosion/fix/train-config-interpolation	2020-09-26 15:56:45 +02:00
Ines Montani	b4486d747d	Merge branch 'develop' into fix/train-config-interpolation	2020-09-26 15:32:14 +02:00
Ines Montani	8fea06d55e	Merge pull request #6149 from adrianeboyd/feature/attributeruler-match-ids Simplify string match IDs for AttributeRuler	2020-09-26 15:31:30 +02:00
Ines Montani	b78a60ef73	Merge pull request #6150 from explosion/feature/cli-config-validation Improve CLI config validation with latest Thinc	2020-09-26 15:30:51 +02:00
Ines Montani	b2d07de786	Construct nlp from uninterpolated config before training	2020-09-26 15:16:59 +02:00
Ines Montani	e06ff8b71d	Update docs [ci skip]	2020-09-26 13:18:08 +02:00
Ines Montani	ca3c997062	Improve CLI config validation with latest Thinc	2020-09-26 13:13:57 +02:00
Adriane Boyd	6c25e60089	Simplify string match IDs for AttributeRuler	2020-09-26 11:12:39 +02:00
Matthew Honnibal	702edf52a0	Fix attributeruler	2020-09-26 00:30:48 +02:00
Matthew Honnibal	821f37254c	Fix attributeruler	2020-09-26 00:19:53 +02:00
Matthew Honnibal	98327f66a9	Fix attributeruler key	2020-09-25 23:20:50 +02:00
Matthew Honnibal	092ce4648e	Make DocBin output stable data (set iteration)	2020-09-25 22:20:44 +02:00
Matthew Honnibal	26afd3bd90	Fix iteration order	2020-09-25 21:47:22 +02:00
Matthew Honnibal	3d8388969e	Sort paths for cache consistency	2020-09-25 19:07:26 +02:00
Adriane Boyd	c3b5a3cfff	Clean up MorphAnalysisC struct (#6146 )	2020-09-25 15:56:48 +02:00
Sofie Van Landeghem	009ba14aaf	Fix pretraining in train script (#6143 ) * update pretraining API in train CLI * bump thinc to 8.0.0a35 * bump to 3.0.0a26 * doc fixes * small doc fix	2020-09-25 15:47:10 +02:00
Ines Montani	02a1b6ab83	Update links [ci skip]	2020-09-25 13:21:43 +02:00
Ines Montani	2cfe9340a1	Link model components [ci skip]	2020-09-25 13:21:20 +02:00
Ines Montani	35f707aa59	Merge pull request #6145 from adrianeboyd/bugfix/revert-score-spans-edits Revert changes to Scorer.score_spans	2020-09-25 09:28:00 +02:00
Ines Montani	c7956a4047	Update models.js [ci skip]	2020-09-25 09:25:46 +02:00
Adriane Boyd	50f20cf722	Revert changes to Scorer.score_spans	2020-09-25 08:21:47 +02:00
Matthew Honnibal	93d7ff309f	Remove print	2020-09-24 21:05:27 +02:00
Ines Montani	2aa4d65734	Update docs [ci skip]	2020-09-24 20:41:09 +02:00
Matthew Honnibal	16475528f7	Fix skipped documents in entity scorer (#6137 ) * Fix skipped documents in entity scorer * Add back the skipping of unannotated entities * Update spacy/scorer.py * Use more specific NER scorer * Fix import * Fix get_ner_prf * Add scorer * Fix scorer Co-authored-by: Ines Montani <ines@ines.io>	2020-09-24 20:38:57 +02:00
Matthew Honnibal	2abb4ba9db	Make a pre-check to speed up alignment cache (#6139 ) * Dirty trick to fast-track alignment cache * Improve alignment cache check * Fix header * Fix align cache * Fix align logic	2020-09-24 18:13:39 +02:00
Ines Montani	26e28ed413	Fix combined scores if multiple components report it	2020-09-24 17:11:13 +02:00
Ines Montani	0b52b6904c	Update entity_linker.py	2020-09-24 17:10:35 +02:00
Ines Montani	20b89a9717	Increment version [ci skip]	2020-09-24 16:57:02 +02:00
Adriane Boyd	3c062b3911	Add MORPH handling to Matcher (#6107 ) * Add MORPH handling to Matcher * Add `MORPH` to `Matcher` schema * Rename `_SetMemberPredicate` to `_SetPredicate` * Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate` * Add special handling for normalization and conversion of morph values into sets * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only matches for 0 or 1 values * Update test * Rename to IS_SUBSET and IS_SUPERSET	2020-09-24 16:55:09 +02:00
Adriane Boyd	59340606b7	Add option to disable Matcher errors (#6125 ) * Add option to disable Matcher errors * Add option to disable Matcher errors when a doc doesn't contain a particular type of annotation Minor additional change: * Update `AttributeRuler.load_from_morph_rules` to allow direct `MORPH` values * Rename suppress_errors to allow_missing Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com> * Refactor annotation checks in Matcher and PhraseMatcher Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-24 16:54:39 +02:00
Sofie Van Landeghem	c7eedd3534	updates to NEL functionality (#6132 ) * NEL: read sentences and ents from reference * fiddling with sent_start annotations * add KB serialization test * KB write additional file with strings.json * score_links function to calculate NEL P/R/F * formatting * documentation	2020-09-24 16:53:59 +02:00
Ines Montani	d0ef4a4cf5	Prevent division by zero in score weights	2020-09-24 16:42:13 +02:00
Matthew Honnibal	74ee456374	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-24 16:11:47 +02:00

1 2 3 4 5 ...

13288 Commits