spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-11-03 01:17:52 +03:00

Author	SHA1	Message	Date
Adriane Boyd	0e7f94b247	Update Tokenizer.explain with special matches (#7749 ) * Update Tokenizer.explain with special matches Update `Tokenizer.explain` and the pseudo-code in the docs to include the processing of special cases that contain affixes or whitespace. * Handle optional settings in explain * Add test for special matches in explain Add test for `Tokenizer.explain` for special cases containing affixes.	2021-04-19 19:08:20 +10:00
Bram Vanroy	ed561cf428	Terminology: deprecated vs obsolete (#7621 ) * Terminology: deprecated vs obsolete Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works. In light of this, perhaps all other error codes should be checked as well. * clarify that the link command is removed and not just deprecated Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-12 14:37:00 +02:00
Adriane Boyd	673e2bc4c0	Add usage docs for streamed train corpora (#7693 )	2021-04-09 16:15:38 +02:00
Ayush Chaurasia	3c2ce41dd8	W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429 ) * Add optional artifacts logging * Update docs * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Bump WandbLogger Version * Add documentation of v1 to legacy docs * bump spacy-legacy to 3.0.2 (to be released) Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-01 19:36:23 +02:00
Santiago Castro	af07fc3bc1	Add support for CUDA 11.2 (#7583 ) * Add support for CUDA 11.2 * Update the docs * Format Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán	5b4dde38a3	fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606 )	2021-03-30 09:45:49 +02:00
Adriane Boyd	0d2b723e8d	Update entity setting section	2021-03-20 11:38:55 +01:00
Adriane Boyd	6a9a467766	Update website/docs/usage/processing-pipelines.md Co-authored-by: Ines Montani <ines@ines.io>	2021-03-19 08:12:49 +01:00
Adriane Boyd	40e5d3a980	Update saving/loading example	2021-03-18 16:56:10 +01:00
Adriane Boyd	0fb1881f36	Reformat processing pipelines	2021-03-18 13:31:42 +01:00
Adriane Boyd	acc58719da	Update custom similarity hooks example	2021-03-18 13:31:42 +01:00
Adriane Boyd	c9e1a9ac17	Add multiprocessing section	2021-03-18 13:31:42 +01:00
Adriane Boyd	9a254d3995	Include all en_core_web_sm components in examples	2021-03-18 13:31:42 +01:00
bsweileh	61472e7cb3	Update _training.md - Fix broken link on backpropagation (#7431 ) * Update _training.md Fix broken link on backpropagation * Add agreement add spacy contributor agreement	2021-03-15 09:21:35 +01:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Sofie Van Landeghem	932887b950	textcat scoring fix and multi_label docs (#6974 ) * add multi-label textcat to menu * add infobox on textcat API * add info to v3 migration guide * small edits * further fixes in doc strings * add infobox to textcat architectures * add textcat_multilabel to overview of built-in components * spelling * fix unrelated warn msg * Add textcat_multilabel to quickstart [ci skip] * remove separate documentation page for multilabel_textcategorizer * small edits * positive label clarification * avoid duplicating information in self.cfg and fix textcat.score * fix multilabel textcat too * revert threshold to storage in cfg * revert threshold stuff for multi-textcat Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:04:22 +11:00
Ines Montani	dfb23a419e	Merge branch 'spacy.io' [ci skip]	2021-03-06 17:38:54 +11:00
graue70	7d085d5b1c	Fix typo in docs	2021-03-05 18:30:09 +01:00
svlandeg	d900c55061	consistently use registry as callable	2021-03-02 17:56:28 +01:00
svlandeg	08fd901a1b	kb.get_candidates renamed to get_alias_candidates	2021-02-25 20:09:36 +01:00
Ines Montani	24cecbb3f4	Merge pull request #7126 from adrianeboyd/docs/gpu-id-opt [ci skip] Add tip about --gpu-id to training quickstart	2021-02-24 22:34:17 +11:00
Tocic	b1996a51a1	fix typo in models.md (#7157 )	2021-02-22 09:00:38 +01:00
Adriane Boyd	7198be0f4b	Add tip about --gpu-id to training quickstart	2021-02-19 14:07:51 +01:00
Sofie Van Landeghem	709c9e75af	span.ent only returns first sentence (#7084 ) * return first sentence when span contains sentence boundary * docs fix * small fixes * cleanup	2021-02-19 23:02:38 +11:00
palandlom	9b82586699	var batch is useless (#7111 ) It seems that nlp.update(examples) should be nlp.update(batch)	2021-02-18 09:44:22 +01:00
Ines Montani	fc4fb6eb3a	Make v2.x docs more prominent [ci skip]	2021-02-17 23:42:27 +11:00
Ines Montani	c08b3f294c	Support env vars and CLI overrides for project.yml	2021-02-10 13:45:27 +11:00
svlandeg	9a7f33c916	final 3.0 benchmark numbers	2021-02-09 21:28:33 +01:00
svlandeg	bb7482bef8	fix link	2021-02-08 18:39:59 +01:00
Ines Montani	433835d9b0	Merge pull request #6889 from adrianeboyd/docs/source-install-dup [ci skip]	2021-02-05 13:35:16 +11:00
Ines Montani	2cdfcd2d19	Update naming [ci skip]	2021-02-03 12:48:31 +11:00
Adriane Boyd	37a68a06ab	Update to recommend editable installs for source installs	2021-02-02 16:51:27 +01:00
Adriane Boyd	3a3e4daf60	Update install instructions * Remove duplicate section about compiling from source	2021-02-02 14:44:15 +01:00
Pengcheng YIN	6fdc33203a	Fix a typo	2021-02-01 17:26:28 -05:00
Ines Montani	a59f3fcf5d	Make wheel the default format and update docs [ci skip]	2021-02-01 23:18:43 +11:00
Ines Montani	31b842d6ce	Update table [ci skip]	2021-02-01 14:17:52 +11:00
Ines Montani	7752f80f39	Update docs [ci skip]	2021-01-31 16:11:24 +11:00
Ines Montani	a8a1231ccd	Update README and docs [ci skip]	2021-01-31 12:36:04 +11:00
Ines Montani	ae07416fda	Merge branch 'website/v3-launch' into develop	2021-01-30 20:31:06 +11:00
Ines Montani	2332c4280b	Update and use unified --build option	2021-01-30 13:11:36 +11:00
Ines Montani	2609ba4e89	Support building wheel in spacy package	2021-01-30 11:54:02 +11:00
Ines Montani	95e958a229	Merge pull request #6852 from explosion/feature/replace-listeners	2021-01-30 00:58:08 +11:00
Ines Montani	7694f76dd1	Update warning and mention replace_listeners	2021-01-29 23:46:01 +11:00
Adriane Boyd	8b76cb8095	Rephrase transformers PyTorch instructions	2021-01-29 13:36:56 +01:00
Adriane Boyd	e3e87e7275	Update transfomers install docs * Recommend installing PyTorch separately * Add instructions for `sentencepiece`	2021-01-29 13:27:43 +01:00
Ines Montani	99af9e7125	Update documentation	2021-01-29 18:45:48 +11:00
Ines Montani	35d79c0a5d	Adjust formatting [ci skip]	2021-01-27 13:31:25 +11:00
Ines Montani	5d79d1af50	Merge pull request #6796 from svlandeg/docs/benchmarks [ci skip]	2021-01-27 13:01:23 +11:00
Ines Montani	1ed7029d47	Update website for v3 launch	2021-01-27 12:39:47 +11:00
Adriane Boyd	61c9f8bf24	Remove transformers model max length section (#6807 )	2021-01-25 19:59:34 +08:00
svlandeg	56064faed9	update caption	2021-01-23 00:57:00 +01:00
svlandeg	d7c0f40a96	update comment	2021-01-22 18:55:18 +01:00
svlandeg	a071279bc7	add speed comparison to docs	2021-01-22 18:46:35 +01:00
svlandeg	b132cb3036	update accuracies for new a1 models	2021-01-21 20:24:05 +01:00
Sofie Van Landeghem	e680efc7cc	Set annotations in update (#6767 ) * bump to 3.0.0rc4 * do set_annotations in component update calls * update docs and remove set_annotations flag * fix EL test	2021-01-20 11:49:25 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Ines Montani	4a1029a9b6	Add infobox [ci skip]	2021-01-19 19:18:39 +11:00
Sofie Van Landeghem	fed8f48965	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 ) * raise NotImplementedError when noun_chunks iterator is not implemented * bring back, fix and document span.noun_chunks * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-17 19:56:05 +08:00
Adriane Boyd	bf0cdae8d4	Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory	2021-01-17 19:54:41 +08:00
Matthew Honnibal	f277bfdf0f	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 ) * Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>	2021-01-14 17:30:41 +11:00
Adriane Boyd	a45d89f09a	Add initialize.before_init and after_init callbacks Add `initialize.before_init` and `initialize.after_init` callbacks to the config. The `initialize.before_init` callback is a place to implement one-time tokenizer customizations that are then saved with the model.	2021-01-12 13:07:44 +01:00
Sofie Van Landeghem	a612a5ba3f	fix small typos (#6698 )	2021-01-08 09:39:47 +01:00
Sofie Van Landeghem	75d9019343	Fix types of Tok2Vec encoding architectures (#6442 ) * fix TorchBiLSTMEncoder documentation * ensure the types of the encoding Tok2vec layers are correct * update references from v1 to v2 for the new architectures	2021-01-07 16:39:27 +11:00
Sofie Van Landeghem	82ae95267a	Docs for pretrain architectures (#6605 ) * document pretraining architectures * formatting * bit more info * small fixes	2021-01-06 16:12:30 +11:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Ines Montani	85ca8c2bdd	Merge branch 'master' into develop	2020-12-11 13:44:41 +11:00
Ines Montani	fb43a30a71	Merge pull request #6545 from svlandeg/feature/discussions [ci skip]	2020-12-11 10:20:35 +11:00
svlandeg	5afa567767	replace gitter with discussions in 101	2020-12-10 20:17:36 +01:00
Adriane Boyd	27bb75e2a0	Docs and extras updates for v2.3.5 * Update install instructions for updated packages * Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only compatible with `thinc>=7.4.4`)	2020-12-10 15:34:34 +01:00
Ines Montani	513c4e332a	Include custom code via spacy package command (#6531 )	2020-12-10 20:36:46 +08:00
Ines Montani	1980203229	Merge branch 'master' into pr/6444	2020-12-09 11:09:40 +11:00
Ines Montani	05a2812ae0	Merge branch 'develop' into pr/6444	2020-12-09 11:04:03 +11:00
Ines Montani	8921364579	Merge pull request #6521 from explosion/feature/config-stdin Allow reading config from stdin in spacy train	2020-12-08 22:07:43 +11:00
Ines Montani	94a5a9814f	Update argument handling and documentation	2020-12-08 20:41:18 +11:00
Ines Montani	ef59ce783b	Adjust install instructions [ci skip]	2020-12-08 18:06:50 +11:00
Ines Montani	d8e01ca931	Merge pull request #6391 from adrianeboyd/docs/install-guide	2020-12-08 07:42:16 +01:00
Ines Montani	c2b196c2c1	Merge pull request #6419 from svlandeg/feature/rel-docs	2020-12-08 06:30:41 +01:00
Adriane Boyd	1442d2f213	Improve simple training example in v3 migration (#6438 ) * Create the examples once * Use the examples in the initialization * Provide the batch size * Fix `begin_training` migration example	2020-11-30 09:39:45 +08:00
Adriane Boyd	03ae77e603	Add SPACY as a Matcher attribute (#6463 )	2020-11-30 09:34:50 +08:00
Adriane Boyd	724831b066	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master * Update Macedonian for v3 * Update Turkish for v3	2020-11-25 11:49:34 +01:00
Jacob Bortell	fe9009911a	Update rule-based-matching.md (#6421 ) * Update rule-based-matching.md Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc). Clarified "Type" column header to "Value Type" * Update rule-based-matching.md Improved clarity of wording	2020-11-24 16:20:19 +01:00
Adriane Boyd	6f133877aa	Update source install instructions * Don't recommend an editable install in the default source instructions. * Use `pip install --no-build-isolation` for editable installs. * Remove reference to `virtualenv`.	2020-11-24 14:44:13 +01:00
svlandeg	218abaa69a	typo	2020-11-20 22:36:49 +01:00
svlandeg	e861e928df	more small corrections	2020-11-20 22:29:58 +01:00
svlandeg	5ac0867427	final fixes	2020-11-20 22:18:53 +01:00
svlandeg	331ec83493	edits and updates to implementing REL component docs	2020-11-20 21:41:52 +01:00
svlandeg	4a3e611abc	small fixes and formatting	2020-11-20 15:55:05 +01:00
svlandeg	124f49feb6	update REL model code	2020-11-20 15:25:20 +01:00
Adriane Boyd	96726ec1f6	Fix DocBin init in training example (#6396 )	2020-11-17 14:36:44 +01:00
Adriane Boyd	ed32fa80cd	Update source install instructions * Use `pip install` instead of `python setup.py install` * For developers recommend: * `python setup.py build_ext --inplace -j N` * `python setup.py develop`	2020-11-16 10:13:51 +01:00
svlandeg	99d0412b6e	add link to REL project	2020-11-15 18:35:56 +01:00
Ines Montani	de6453940e	Merge pull request #6305 from svlandeg/feature/score-docs [ci skip]	2020-11-10 02:52:11 +01:00
Ines Montani	d7950c5ada	Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip]	2020-11-10 02:45:52 +01:00
Ines Montani	363ac73c72	Update docs [ci skip]	2020-11-09 12:43:26 +08:00
Ines Montani	019a1dd5e8	Fix v3 overview [ci skip]	2020-11-03 18:10:06 +01:00
Adriane Boyd	dc816bba9d	Fix node name typo in dependency matcher example (#6311 )	2020-10-28 16:32:46 +01:00
svlandeg	77688b0072	fix config	2020-10-26 11:14:34 +01:00
svlandeg	5878ff6bcd	cleanup	2020-10-26 11:13:02 +01:00
svlandeg	e95d9caa87	small edits	2020-10-26 11:09:25 +01:00
svlandeg	a664994a81	adding score method to explanation of new component	2020-10-26 10:52:47 +01:00

1 2 3 4 5 ...

988 Commits