spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-09-21 19:39:13 +03:00

Author	SHA1	Message	Date
Edward	014da12f1d	Dont add tok2vec when efficiency textcat (#9502 )	2021-10-20 17:30:19 +02:00
Sofie Van Landeghem	3fd3531e12	Docs for new spacy-trf architectures (#8954 ) * use TransformerModel.v2 in quickstart * update docs for new transformer architectures * bump spacy_transformers to 1.1.0 * Add new arguments spacy-transformers.TransformerModel.v3 * Mention that mixed-precision support is experimental * Describe delta transformers.Tok2VecTransformer versions * add dot * add dot, again * Update some more TransformerModel references v2 -> v3 * Add mixed-precision options to the training quickstart Disable mixed-precision training/prediction by default. * Update setup.cfg Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Apply suggestions from code review Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/usage/embeddings-transformers.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Daniël de Kok <me@danieldk.eu> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-10-18 14:15:06 +02:00
Adriane Boyd	8448c7dbc5	Update da trf recommendation (#8921 ) Update the da trf recommendation to the same model used in the pretrained pipelines.	2021-08-12 13:54:02 +02:00
Adriane Boyd	5aa099505f	Preserve paths.vectors/initialize.vectors setting in quickstart template	2021-06-23 11:07:14 +02:00
Sofie Van Landeghem	e796aab4b3	Resizable textcat (#7862 ) * implement textcat resizing for TextCatCNN * resizing textcat in-place * simplify code * ensure predictions for old textcat labels remain the same after resizing (WIP) * fix for softmax * store softmax as attr * fix ensemble weight copy and cleanup * restructure slightly * adjust documentation, update tests and quickstart templates to use latest versions * extend unit test slightly * revert unnecessary edits * fix typo * ensemble architecture won't be resizable for now * use resizable layer (WIP) * revert using resizable layer * resizable container while avoid shape inference trouble * cleanup * ensure model continues training after resizing * use fill_b parameter * use fill_defaults * resize_layer callback * format * bump thinc to 8.0.4 * bump spacy-legacy to 3.0.6	2021-06-16 11:45:00 +02:00
Adriane Boyd	cd6bd91c3a	Switch default train corpus max_length to 0 in quickstart (#8142 ) The behavior of `spacy.Corpus.v1` is unexpected enough for `max_length != 0` that `0` is a better default for users creating a new config with the quickstart. If not, documents are skipped, sometimes the entire corpus is skipped, and sometimes documents are (quite unexpectedly for your average user) split into sentences.	2021-05-20 14:48:09 +02:00
Adriane Boyd	d2bdaa7823	Replace negative rows with 0 in StaticVectors (#7674 ) * Replace negative rows with 0 in StaticVectors Replace negative row indices with 0-vectors in `StaticVectors`. * Increase versions related to StaticVectors * Increase versions of all architctures and layers related to `StaticVectors` * Improve efficiency of 0-vector operations Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5 * Update config defaults to new versions * Update docs	2021-04-22 18:04:15 +10:00
Adriane Boyd	8a4200d4e9	Omit unused tok2vec/transformer components Omit unused tok2vec/transformer components in quickstart template.	2021-03-02 15:53:30 +01:00
Adriane Boyd	ee7bb0b393	Fix formatting in bg/bn quickstart recs	2021-02-26 17:08:37 +01:00
Ines Montani	1e3a326e53	Change Dutch transformer recommendation [ci skip] https://github.com/explosion/spaCy/discussions/6529#discussioncomment-366620	2021-02-14 15:30:16 +11:00
Adriane Boyd	0ee2ae86bf	Update trf quickstart recommendations Add/update trf recommendations for Bengali, Hindi, Sinhala, and Tamil based on #7044.	2021-02-12 15:55:17 +01:00
Adriane Boyd	35a863cd27	Remove nlp.tokenizer from quickstart template Remove `nlp.tokenizer` from quickstart template so that the default language-specific tokenizer settings are filled instead.	2021-02-01 11:20:12 +01:00
Ines Montani	78d6ff4dd4	Update quickstart recommendations	2021-01-28 11:14:49 +11:00
Ines Montani	ec5f55aa5b	Update config generation defaults and transformers (#6832 )	2021-01-27 23:56:33 +11:00
Sofie Van Landeghem	75d9019343	Fix types of Tok2Vec encoding architectures (#6442 ) * fix TorchBiLSTMEncoder documentation * ensure the types of the encoding Tok2vec layers are correct * update references from v1 to v2 for the new architectures	2021-01-07 16:39:27 +11:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Sofie Van Landeghem	282a3b49ea	Fix parser resizing when there is no upper layer (#6460 ) * allow resizing of the parser model even when upper=False * update from spacy.TransitionBasedParser.v1 to v2 * bugfix	2020-12-18 18:56:57 +08:00
Adriane Boyd	fa8fa474a3	Add nlp.batch_size setting Add a default `batch_size` setting for `Language.pipe` and `Language.evaluate` as `nlp.batch_size`.	2020-12-09 09:13:26 +01:00
Sofie Van Landeghem	a0c899a0ff	Fix textcat + transformer architecture (#6371 ) * add pooling to textcat TransformerListener * maybe_get_dim in case it's null	2020-11-10 20:14:47 +08:00
Sofie Van Landeghem	75a202ce65	TextCat updates and fixes (#6263 ) * small fix in example imports * throw error when train_corpus or dev_corpus is not a string * small fix in custom logger example * limit macro_auc to labels with 2 annotations * fix typo * also create parents of output_dir if need be * update documentation of textcat scores * refactor TextCatEnsemble * fix tests for new AUC definition * bump to 3.0.0a42 * update docs * rename to spacy.TextCatEnsemble.v2 * spacy.TextCatEnsemble.v1 in legacy * cleanup * small fix * update to 3.0.0rc2 * fix import that got lost in merge * cursed IDE * fix two typos	2020-10-18 14:50:41 +02:00
Adriane Boyd	c8d04b79e2	Sort and add vectors for langs without transformers	2020-10-16 08:25:16 +02:00
Adriane Boyd	2fbd43c603	Use core lg models as vectors models in quickstart	2020-10-16 08:17:53 +02:00
Ines Montani	1f49300862	Update transformer recommendations [ci skip]	2020-10-13 15:41:17 +02:00
Matthew Honnibal	b7e01d2024	Fix quickstart	2020-10-05 21:21:30 +02:00
Matthew Honnibal	ff8b980775	Upd quickstart template	2020-10-05 21:19:41 +02:00
Adriane Boyd	22158dc24a	Add morphologizer to quickstart template	2020-10-02 15:06:16 +02:00
Ines Montani	fe3f111c37	Merge pull request #6168 from explosion/fix/default-corpus-values	2020-09-30 00:24:02 +02:00
Ines Montani	ae51843468	Remove augmenter from jinja template [ci skip]	2020-09-29 23:08:50 +02:00
Ines Montani	1aeef3bfbb	Make corpus paths default to None and improve errors	2020-09-29 22:33:46 +02:00
Ines Montani	d3c63b7965	Merge branch 'develop' into feature/prepare	2020-09-29 20:53:05 +02:00
Ines Montani	534e1ef498	Fix template	2020-09-29 17:02:55 +02:00
Ines Montani	1590de11b1	Update config	2020-09-28 12:05:23 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Ines Montani	ae51f580c1	Fix handling of score_weights	2020-09-24 10:27:33 +02:00
svlandeg	35dbc63578	Merge remote-tracking branch 'upstream/develop' into fix/nr_features # Conflicts: # spacy/ml/models/parser.py # spacy/tests/serialize/test_serialize_config.py # website/docs/api/architectures.md	2020-09-23 17:01:13 +02:00
svlandeg	dd2292793f	'parser' instead of 'deps' for state_type	2020-09-23 16:53:49 +02:00
svlandeg	6c85fab316	state_type and extra_state_tokens instead of nr_feature_tokens	2020-09-23 13:35:09 +02:00
Ines Montani	7745d77a38	Fix whitespace in template [ci skip]	2020-09-23 13:21:42 +02:00
Ines Montani	6ca06cb62c	Update docs and formatting [ci skip]	2020-09-23 10:14:27 +02:00
svlandeg	556f3e4652	add pooling to NEL's TransformerListener	2020-09-23 09:24:28 +02:00
svlandeg	085a1c8e2b	add no_output_layer to TextCatBOW config	2020-09-22 12:06:40 +02:00
svlandeg	e931f4d757	add textcat score	2020-09-22 10:56:43 +02:00
svlandeg	396b33257f	add entity_linker to jinja template	2020-09-22 10:40:05 +02:00
svlandeg	135de82a2d	add textcat to quickstart	2020-09-22 10:22:06 +02:00
Ines Montani	554c9a2497	Update docs [ci skip]	2020-09-20 12:30:53 +02:00
Sofie Van Landeghem	39872de1f6	Introducing the gpu_allocator (#6091 ) * rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator' * --code instead of --code-path * update documentation * avoid querying the "system" section directly * add explanation of gpu_allocator to TF/PyTorch section in docs * fix typo * fix typo 2 * use set_gpu_allocator from thinc 8.0.0a34 * default null instead of empty string	2020-09-19 01:17:02 +02:00
svlandeg	0c35885751	generalize corpora, dot notation for dev and train corpus	2020-09-17 11:38:59 +02:00
svlandeg	51fa929f47	rewrite train_corpus to corpus.train in config	2020-09-15 21:58:04 +02:00
Matthew Honnibal	4b7abaafdb	Fix learn rate for non-transformer	2020-09-04 21:22:50 +02:00
Ines Montani	23b7d9cfa3	Prefix span getters	2020-09-03 17:37:06 +02:00

1 2

58 Commits