spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-11 04:08:09 +03:00

Author	SHA1	Message	Date
Adriane Boyd	86c3ec9c2b	Refactor Token morph setting (#6175 ) * Refactor Token morph setting * Remove `Token.morph_` * Add `Token.set_morph()` * `0` resets `token.c.morph` to unset * Any other values are passed to `Morphology.add` * Add token.morph setter to set from MorphAnalysis	2020-10-01 22:21:46 +02:00
Ines Montani	f2627157c8	Update docs [ci skip]	2020-10-01 17:38:17 +02:00
Adriane Boyd	27cbffff1b	Minor edit to CoNLL-U converter (#6172 ) This doesn't make a difference given how the `merged_morph` values override the `morph` values for all the final docs, but could have led to unexpected bugs in the future if the converter is modified.	2020-10-01 16:23:42 +02:00
Adriane Boyd	df98d3ef9f	Update import from collections.abc (#6174 )	2020-10-01 16:21:49 +02:00
Ines Montani	44160cd52f	Tidy up [ci skip]	2020-10-01 10:41:19 +02:00
Ines Montani	a103ab5f1a	Update augmenter lookups and docs	2020-09-30 23:03:47 +02:00
Matthew Honnibal	c379a4274a	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-30 16:52:42 +02:00
Matthew Honnibal	e58dca3028	Add read_labels	2020-09-30 16:52:27 +02:00
Ines Montani	fe3f111c37	Merge pull request #6168 from explosion/fix/default-corpus-values	2020-09-30 00:24:02 +02:00
Matthew Honnibal	f52249fe2e	Fix data augmentation	2020-09-29 23:40:54 +02:00
Matthew Honnibal	14c4da547f	Try to fix augmentation	2020-09-29 23:08:56 +02:00
Ines Montani	df8dd91b6f	Merge branch 'develop' into fix/default-corpus-values	2020-09-29 22:55:39 +02:00
Ines Montani	ad6d40d028	Add logging	2020-09-29 22:53:14 +02:00
Ines Montani	1aeef3bfbb	Make corpus paths default to None and improve errors	2020-09-29 22:33:46 +02:00
Ines Montani	fa47f87924	Tidy up and auto-format	2020-09-29 21:39:28 +02:00
Ines Montani	d3c63b7965	Merge branch 'develop' into feature/prepare	2020-09-29 20:53:05 +02:00
Ines Montani	2be80379ec	Fix small issues, resolve_dot_names and debug model	2020-09-29 20:38:35 +02:00
Ines Montani	fd594cfb9b	Tighten up format	2020-09-29 16:47:55 +02:00
Ines Montani	978ab54a84	Fix logging	2020-09-29 16:22:41 +02:00
Ines Montani	aa2a6882d0	Fix logging	2020-09-29 16:08:39 +02:00
Ines Montani	63d1598137	Simplify config use in Language.initialize	2020-09-29 16:05:48 +02:00
Ines Montani	612bbf85ab	Update initialize.py	2020-09-29 12:14:47 +02:00
Ines Montani	42f0e4c946	Clean up	2020-09-29 12:14:08 +02:00
Ines Montani	78396d137f	Integrate initialize settings	2020-09-29 11:57:08 +02:00
Ines Montani	4925ad760a	Add init vectors	2020-09-29 10:58:50 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
Ines Montani	046f655d86	Fix error	2020-09-28 21:17:45 +02:00
Ines Montani	a139fe672b	Fix typos and refactor CLI logging	2020-09-28 21:17:10 +02:00
Ines Montani	2e9c9e74af	Fix config resolution and interpolation TODO: auto-interpolate in Thinc if config is dict (i.e. likely subsection)	2020-09-28 15:34:00 +02:00
Ines Montani	822ea4ef61	Refactor CLI	2020-09-28 15:09:59 +02:00
Ines Montani	d5155376fd	Update vocab init	2020-09-28 11:30:18 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Matthew Honnibal	13b1605ee6	Add init script	2020-09-28 01:08:49 +02:00
Matthew Honnibal	26afd3bd90	Fix iteration order	2020-09-25 21:47:22 +02:00
Matthew Honnibal	3d8388969e	Sort paths for cache consistency	2020-09-25 19:07:26 +02:00
Sofie Van Landeghem	009ba14aaf	Fix pretraining in train script (#6143 ) * update pretraining API in train CLI * bump thinc to 8.0.0a35 * bump to 3.0.0a26 * doc fixes * small doc fix	2020-09-25 15:47:10 +02:00
Matthew Honnibal	93d7ff309f	Remove print	2020-09-24 21:05:27 +02:00
Matthew Honnibal	2abb4ba9db	Make a pre-check to speed up alignment cache (#6139 ) * Dirty trick to fast-track alignment cache * Improve alignment cache check * Fix header * Fix align cache * Fix align logic	2020-09-24 18:13:39 +02:00
Ines Montani	58dde293ce	Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2	2020-09-24 14:44:42 +02:00
Ines Montani	f69fea8b25	Improve error handling around non-number scores	2020-09-24 11:29:07 +02:00
Ines Montani	4eb39b5c43	Fix logging	2020-09-24 11:04:35 +02:00
svlandeg	20b0ec5dcf	avoid logging performance of frozen components	2020-09-23 10:37:12 +02:00
Ines Montani	ae5dacf75f	Tidy up and add types	2020-09-23 10:14:34 +02:00
svlandeg	4a56ea72b5	fallbacks for old names	2020-09-23 09:15:07 +02:00
Adriane Boyd	535842e483	Merge branch 'develop' into feature/doc-ents-v3-2	2020-09-22 13:45:50 +02:00
svlandeg	b556a10808	rename converts in_to_out	2020-09-22 11:50:19 +02:00
Ines Montani	67fbcb3da5	Tidy up tests and docs	2020-09-21 20:43:54 +02:00
Adriane Boyd	177df15d89	Implement Doc.set_ents	2020-09-21 15:54:05 +02:00
Adriane Boyd	bc02e86494	Extend Doc.__init__ with additional annotation Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to `Doc.__init__` to initialize the most common doc/token values.	2020-09-21 13:36:24 +02:00
Adriane Boyd	8b650f3a78	Modify setting missing and blocked entity tokens In order to make it easier to construct `Doc` objects as training data, modify how missing and blocked entity tokens are set to prioritize setting `O` and missing entity tokens for training purposes over setting blocked entity tokens. * `Doc.ents` setter sets tokens outside entity spans to `O` regardless of the current state of each token * For `Doc.ents`, setting a span with a missing label sets the `ent_iob` to missing instead of blocked * `Doc.block_ents(spans)` marks spans as hard `O` for use with the `EntityRecognizer`	2020-09-17 21:27:42 +02:00
Adriane Boyd	7e4cd7575c	Refactor Docs.is_ flags (#6044 ) * Refactor Docs.is_ flags * Add derived `Doc.has_annotation` method * `Doc.has_annotation(attr)` returns `True` for partial annotation * `Doc.has_annotation(attr, require_complete=True)` returns `True` for complete annotation * Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced` and `is_nered` * Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The list is the `DocBin` attributes list plus `SPACY` and `LENGTH`. Notes on `Doc.has_annotation`: * `HEAD` is converted to `DEP` because heads don't have an unset state * Accept `IS_SENT_START` as a synonym of `SENT_START` Additional changes: * Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for `DocBin` * In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override `SENT_START` * In `Doc.from_array()` using `attrs` other than `Doc._get_array_attrs()` (i.e., a user's custom list rather than our default internal list) with both `HEAD` and `SENT_START` shows a warning that `HEAD` will override `SENT_START` * `set_children_from_heads` does not require dependency labels to set sentence boundaries and sets `sent_start` for all non-sentence starts to `-1` * Fix call to set_children_form_heads Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-17 00:14:01 +02:00
svlandeg	7677e5c0e2	fix wandb logger when calling multiple times from same script	2020-09-15 12:56:33 +02:00
Ines Montani	154752f9c2	Update docs and consistency [ci skip]	2020-09-15 00:32:49 +02:00
Matthew Honnibal	ae15fa9688	Fix iob converter	2020-09-14 21:02:18 +02:00
Ines Montani	61a4ef0b46	Fix syntax error	2020-09-13 19:23:09 +02:00
Matthew Honnibal	b693d2d224	Fix speed report in table	2020-09-13 17:39:31 +02:00
Matthew Honnibal	54c40223a1	Improve v3 pretrain command (#6040 ) * Starts to run * Update pretrain script * Update corpus * Update pretrain schema * Remove outdated test * Make JsonlTexts produce Example objects.	2020-09-13 14:05:05 +02:00
Sofie Van Landeghem	e92e850c72	Raise if empty examples (#6052 ) * raise error if no valid Example objects were found during initialization * fix max_length parameter * remove commit from other branch Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-12 21:01:53 +02:00
Sofie Van Landeghem	8e7557656f	Renaming gold & annotation_setter (#6042 ) * version bump to 3.0.0a16 * rename "gold" folder to "training" * rename 'annotation_setter' to 'set_extra_annotations' * formatting	2020-09-09 10:31:03 +02:00

1 2 3 4 5

209 Commits