spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-13 13:17:06 +03:00

Author	SHA1	Message	Date
Adriane Boyd	86c3ec9c2b	Refactor Token morph setting (#6175 ) * Refactor Token morph setting * Remove `Token.morph_` * Add `Token.set_morph()` * `0` resets `token.c.morph` to unset * Any other values are passed to `Morphology.add` * Add token.morph setter to set from MorphAnalysis	2020-10-01 22:21:46 +02:00
Ines Montani	f2627157c8	Update docs [ci skip]	2020-10-01 17:38:17 +02:00
Adriane Boyd	27cbffff1b	Minor edit to CoNLL-U converter (#6172 ) This doesn't make a difference given how the `merged_morph` values override the `morph` values for all the final docs, but could have led to unexpected bugs in the future if the converter is modified.	2020-10-01 16:23:42 +02:00
Adriane Boyd	df98d3ef9f	Update import from collections.abc (#6174 )	2020-10-01 16:21:49 +02:00
Ines Montani	44160cd52f	Tidy up [ci skip]	2020-10-01 10:41:19 +02:00
Ines Montani	a103ab5f1a	Update augmenter lookups and docs	2020-09-30 23:03:47 +02:00
Matthew Honnibal	c379a4274a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-30 16:52:42 +02:00
Matthew Honnibal	e58dca3028	Add read_labels	2020-09-30 16:52:27 +02:00
Ines Montani	fe3f111c37	Merge pull request #6168 from explosion/fix/default-corpus-values	2020-09-30 00:24:02 +02:00
Matthew Honnibal	f52249fe2e	Fix data augmentation	2020-09-29 23:40:54 +02:00
Matthew Honnibal	14c4da547f	Try to fix augmentation	2020-09-29 23:08:56 +02:00
Ines Montani	df8dd91b6f	Merge branch 'develop' into fix/default-corpus-values	2020-09-29 22:55:39 +02:00
Ines Montani	ad6d40d028	Add logging	2020-09-29 22:53:14 +02:00
Ines Montani	1aeef3bfbb	Make corpus paths default to None and improve errors	2020-09-29 22:33:46 +02:00
Ines Montani	fa47f87924	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
Ines Montani	d3c63b7965	Merge branch 'develop' into feature/prepare	2020-09-29 20:53:05 +02:00
Ines Montani	2be80379ec	Fix small issues, resolve_dot_names and debug model	2020-09-29 20:38:35 +02:00
Ines Montani	fd594cfb9b	Tighten up format	2020-09-29 16:47:55 +02:00
Ines Montani	978ab54a84	Fix logging	2020-09-29 16:22:41 +02:00
Ines Montani	aa2a6882d0	Fix logging	2020-09-29 16:08:39 +02:00
Ines Montani	63d1598137	Simplify config use in Language.initialize	2020-09-29 16:05:48 +02:00
Ines Montani	612bbf85ab	Update initialize.py	2020-09-29 12:14:47 +02:00
Ines Montani	42f0e4c946	Clean up	2020-09-29 12:14:08 +02:00
Ines Montani	78396d137f	Integrate initialize settings	2020-09-29 11:57:08 +02:00
Ines Montani	4925ad760a	Add init vectors	2020-09-29 10:58:50 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
Ines Montani	046f655d86	Fix error	2020-09-28 21:17:45 +02:00
Ines Montani	a139fe672b	Fix typos and refactor CLI logging	2020-09-28 21:17:10 +02:00
Ines Montani	2e9c9e74af	Fix config resolution and interpolation TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)	2020-09-28 15:34:00 +02:00
Ines Montani	822ea4ef61	Refactor CLI	2020-09-28 15:09:59 +02:00
Ines Montani	d5155376fd	Update vocab init	2020-09-28 11:30:18 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Matthew Honnibal	13b1605ee6	Add init script	2020-09-28 01:08:49 +02:00
Matthew Honnibal	26afd3bd90	Fix iteration order	2020-09-25 21:47:22 +02:00
Matthew Honnibal	3d8388969e	Sort paths for cache consistency	2020-09-25 19:07:26 +02:00
Sofie Van Landeghem	009ba14aaf	Fix pretraining in train script (#6143 ) * update pretraining API in train CLI * bump thinc to 8.0.0a35 * bump to 3.0.0a26 * doc fixes * small doc fix	2020-09-25 15:47:10 +02:00
Matthew Honnibal	93d7ff309f	Remove print	2020-09-24 21:05:27 +02:00
Matthew Honnibal	2abb4ba9db	Make a pre-check to speed up alignment cache (#6139 ) * Dirty trick to fast-track alignment cache * Improve alignment cache check * Fix header * Fix align cache * Fix align logic	2020-09-24 18:13:39 +02:00
Ines Montani	58dde293ce	Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2	2020-09-24 14:44:42 +02:00
Ines Montani	f69fea8b25	Improve error handling around non-number scores	2020-09-24 11:29:07 +02:00
Ines Montani	4eb39b5c43	Fix logging	2020-09-24 11:04:35 +02:00
svlandeg	20b0ec5dcf	avoid logging performance of frozen components	2020-09-23 10:37:12 +02:00
Ines Montani	ae5dacf75f	Tidy up and add types	2020-09-23 10:14:34 +02:00
svlandeg	4a56ea72b5	fallbacks for old names	2020-09-23 09:15:07 +02:00
Adriane Boyd	535842e483	Merge branch 'develop' into feature/doc-ents-v3-2	2020-09-22 13:45:50 +02:00
svlandeg	b556a10808	rename converts in_to_out	2020-09-22 11:50:19 +02:00
Ines Montani	67fbcb3da5	Tidy up tests and docs	2020-09-21 20:43:54 +02:00
Adriane Boyd	177df15d89	Implement Doc.set_ents	2020-09-21 15:54:05 +02:00
Adriane Boyd	bc02e86494	Extend Doc.__init__ with additional annotation Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to `Doc.__init__` to initialize the most common doc/token values.	2020-09-21 13:36:24 +02:00
Adriane Boyd	8b650f3a78	Modify setting missing and blocked entity tokens In order to make it easier to construct `Doc` objects as training data, modify how missing and blocked entity tokens are set to prioritize setting `O` and missing entity tokens for training purposes over setting blocked entity tokens. * `Doc.ents` setter sets tokens outside entity spans to `O` regardless of the current state of each token * For `Doc.ents`, setting a span with a missing label sets the `ent_iob` to missing instead of blocked * `Doc.block_ents(spans)` marks spans as hard `O` for use with the `EntityRecognizer`	2020-09-17 21:27:42 +02:00

1 2

59 Commits