spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-01-06 07:16:29 +03:00

Author	SHA1	Message	Date
Cass	c84eb478eb	Fix a command typo in models.md "dowmload" -> "download"	2021-07-06 14:46:14 +09:00
Ines Montani	218ed81f84	Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597 )	2021-07-05 13:52:15 +02:00
Nick Sorros	5467bd3655	Switch model and data path in prodigy project.yml recipe (#8467 )	2021-06-22 09:43:48 +02:00
Ines Montani	5c46edbf0d	Add link anchor [ci skip]	2021-06-20 11:29:31 +10:00
Sofie Van Landeghem	4b81f58eda	fix docs (#8200 )	2021-05-27 10:50:46 +02:00
Paul O'Leary McCann	68ccfc4c39	Fix docs (fix #8189 )	2021-05-24 19:49:21 +09:00
Adriane Boyd	e0b0892ef2	Minor updates to quickstart settings/instructions (#7965 ) * Minor updates to quickstart settings/instructions * set default value of textcat exclusive to `false` until the default checkbox behavior is updated * add the `morphologizer` to the list of components * add a note that v3.0.6+ is required * Switch to warning above quickstart * Undo changes to textcat default in quickstart Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-05-17 16:55:50 +02:00
Adriane Boyd	40ca23bde0	Fix new version for match_alignments (#8021 )	2021-05-07 09:56:22 +02:00
Adriane Boyd	ffaa0d6b9b	Fix Transformer.initialize example (#7963 )	2021-04-30 12:21:59 +02:00
Adriane Boyd	bdb485cc80	Add callback to copy vocab/tokenizer from model (#7750 ) * Add callback to copy vocab/tokenizer from model Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer settings and/or vocab (including vectors) from a base model. * Move spacy.copy_from_base_model.v1 to spacy.training.callbacks * Add documentation * Modify to specify model as tokenizer and vocab params	2021-04-22 12:36:50 +02:00
Adriane Boyd	f68fc29130	Update sent_starts in Example.from_dict (#7847 ) * Update sent_starts in Example.from_dict Update `sent_starts` for `Example.from_dict` so that `Optional[bool]` values have the same meaning as for `Token.is_sent_start`. Use `Optional[bool]` as the type for sent start values in the docs. * Use helper function for conversion to ternary ints	2021-04-22 11:32:45 +02:00
Adriane Boyd	d2bdaa7823	Replace negative rows with 0 in StaticVectors (#7674 ) * Replace negative rows with 0 in StaticVectors Replace negative row indices with 0-vectors in `StaticVectors`. * Increase versions related to StaticVectors * Increase versions of all architctures and layers related to `StaticVectors` * Improve efficiency of 0-vector operations Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5 * Update config defaults to new versions * Update docs	2021-04-22 18:04:15 +10:00
Sofie Van Landeghem	6f565cf39d	fix typo in entity_linker docs	2021-04-22 09:59:24 +02:00
Sofie Van Landeghem	2e746dbf32	update EL training data format in docs (#7839 ) * update EL training data format * fix typo * all -1 because reasons	2021-04-22 08:50:09 +02:00
Shantam Raj	6017fcf693	Default code for Setting Entity annotations on the website errors (#7738 ) * the default example for "Setting entity annotations" errors on Binder * updating contributer info * using a new variable to store original entities	2021-04-21 09:16:32 +02:00
langdonholmes	df541c6b5e	Update processing-pipelines.md to mention method for doc metadata (#7480 ) * Update processing-pipelines.md Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True) Link to a new example on the attributes page detailing the following: > ``` > data = [ > ("Some text to process", {"meta": "foo"}), > ("And more text...", {"meta": "bar"}) > ] > > for doc, context in nlp.pipe(data, as_tuples=True): > # Let's assume you have a "meta" extension registered on the Doc > doc._.meta = context["meta"] > ``` from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as * Updating the attributes section Update the attributes section with example of how extensions can be used to store metadata. * Update processing-pipelines.md * Update processing-pipelines.md Made as_tuples example executable and relocated to the end of the "Processing Text" section. * Update processing-pipelines.md * Update processing-pipelines.md Removed extra line * Reformat and rephrase Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-19 11:58:12 +02:00
Adriane Boyd	0e7f94b247	Update Tokenizer.explain with special matches (#7749 ) * Update Tokenizer.explain with special matches Update `Tokenizer.explain` and the pseudo-code in the docs to include the processing of special cases that contain affixes or whitespace. * Handle optional settings in explain * Add test for special matches in explain Add test for `Tokenizer.explain` for special cases containing affixes.	2021-04-19 19:08:20 +10:00
Sofie Van Landeghem	c786e98e56	assemble CLI command (#7783 ) * assemble CLI command * ensure assemble runs even without training section * cleanup	2021-04-19 18:39:11 +10:00
Bram Vanroy	ed561cf428	Terminology: deprecated vs obsolete (#7621 ) * Terminology: deprecated vs obsolete Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works. In light of this, perhaps all other error codes should be checked as well. * clarify that the link command is removed and not just deprecated Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-12 14:37:00 +02:00
Adriane Boyd	673e2bc4c0	Add usage docs for streamed train corpora (#7693 )	2021-04-09 16:15:38 +02:00
Sofie Van Landeghem	204c2f116b	Extend score_spans for overlapping & non-labeled spans (#7209 ) * extend span scorer with consider_label and allow_overlap * unit test for spans y2x overlap * add score_spans unit test * docs for new fields in scorer.score_spans * rename to include_label * spell out if-else for clarity * rename to 'labeled' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-08 12:19:17 +02:00
broaddeep	ee159b8543	Support match alignments (#7321 ) * Support match alignments * change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case * remove added errors, utilize bint type, cleanup whitespace * fix no new line in end of file * Minor formatting * Skip alignments processing if as_spans is set * Add with_alignments to Matcher API docs * Update website/docs/api/matcher.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-04-08 18:10:14 +10:00
Ines Montani	de4f4c9b8a	Add more link anchors [ci skip]	2021-04-06 14:15:21 +10:00
Ines Montani	5bbdd7dc4c	Update pipeline design docs [ci skip]	2021-04-06 14:13:22 +10:00
Ines Montani	1d1cfadbca	Fix formatting [ci skip]	2021-04-06 14:13:13 +10:00
Ayush Chaurasia	3c2ce41dd8	W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429 ) * Add optional artifacts logging * Update docs * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Bump WandbLogger Version * Add documentation of v1 to legacy docs * bump spacy-legacy to 3.0.2 (to be released) Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-01 19:36:23 +02:00
Sofie Van Landeghem	59c2069eb1	Legacy docs (#7601 ) * document legacy Tok2Vec architectures * add TextCatEnsemble.v1 legacy documentation * Separate legacy section in side bar	2021-03-30 12:43:14 +02:00
Santiago Castro	af07fc3bc1	Add support for CUDA 11.2 (#7583 ) * Add support for CUDA 11.2 * Update the docs * Format Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán	5b4dde38a3	fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606 )	2021-03-30 09:45:49 +02:00
Ines Montani	be55f43163	Merge pull request #7473 from adrianeboyd/docs/v3-pipeline-deps-order	2021-03-22 12:43:07 +01:00
Ines Montani	3ee2fcfba0	Merge pull request #7483 from adrianeboyd/docs/various-v3-4 [ci skip]	2021-03-22 12:37:06 +01:00
Ines Montani	88e5a0dc16	Merge pull request #7504 from polm/fix/lexeme-docs [ci skip] Fix mismatched backtick in Lexeme docs	2021-03-22 12:36:44 +01:00
Adriane Boyd	0d2b723e8d	Update entity setting section	2021-03-20 11:38:55 +01:00
Paul O'Leary McCann	e39c0dcf33	Fix mismatched backtick in Lexeme docs	2021-03-20 18:40:00 +09:00
Adriane Boyd	c771ec22f0	Update matcher errors and docs * Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for `Matcher` and `PhraseMatcher` * Document `Matcher.__call__(allow_missing=)`	2021-03-19 10:11:18 +01:00
Adriane Boyd	6a9a467766	Update website/docs/usage/processing-pipelines.md Co-authored-by: Ines Montani <ines@ines.io>	2021-03-19 08:12:49 +01:00
Adriane Boyd	6354b642c5	Fix typo	2021-03-18 19:01:10 +01:00
Adriane Boyd	40e5d3a980	Update saving/loading example	2021-03-18 16:56:10 +01:00
Adriane Boyd	0fb1881f36	Reformat processing pipelines	2021-03-18 13:31:42 +01:00
Adriane Boyd	acc58719da	Update custom similarity hooks example	2021-03-18 13:31:42 +01:00
Adriane Boyd	c9e1a9ac17	Add multiprocessing section	2021-03-18 13:31:42 +01:00
Adriane Boyd	9a254d3995	Include all en_core_web_sm components in examples	2021-03-18 13:31:42 +01:00
Adriane Boyd	83c1b919a7	Fix positional/option in CLI types	2021-03-18 13:31:42 +01:00
Adriane Boyd	9fd41d6742	Remove Language.pipe cleanup arg	2021-03-18 13:31:42 +01:00
Adriane Boyd	5da323fd86	Minor edits	2021-03-17 12:59:05 +01:00
Adriane Boyd	a5ffe8dfed	Add details about pretrained pipeline design	2021-03-17 11:31:26 +01:00
bsweileh	61472e7cb3	Update _training.md - Fix broken link on backpropagation (#7431 ) * Update _training.md Fix broken link on backpropagation * Add agreement add spacy contributor agreement	2021-03-15 09:21:35 +01:00
Ines Montani	c67d5a6eb0	Merge pull request #7394 from adrianeboyd/docs/ner-example-data-readme	2021-03-13 04:26:18 +01:00
Ines Montani	068b97a617	Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only	2021-03-13 04:25:50 +01:00
Adriane Boyd	3168103605	Fix type of spacy train --output in docs	2021-03-12 10:04:57 +01:00

1 2 3 4 5 ...

1535 Commits