spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-01-02 21:36:36 +03:00

Author	SHA1	Message	Date
Paul O'Leary McCann	8867e60fbb	Update website/docs/usage/v3.md Co-authored-by: Ines Montani <ines@ines.io>	2021-07-29 14:56:56 +09:00
Adriane Boyd	8547514aa4	Remove labels from textcat component config example (#8815 )	2021-07-27 13:14:38 +02:00
Paul O'Leary McCann	76ac95923a	Add note to migration guide about lexeme tables (fix #7290 ) This just adds the resolution from #6388 to the docs.	2021-07-27 19:19:25 +09:00
Paul O'Leary McCann	67ecdcc3ac	Update subset/superset docs (#8795 ) * Update subset/superset docs * Update website/docs/usage/rule-based-matching.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-07-27 12:08:46 +02:00
Ines Montani	134cb06af3	Merge pull request #8808 from kevinlu1248/master [ci skip] Changed a CLI command in data-formats.md due to erroneous information	2021-07-27 12:15:16 +10:00
Kevin Lu	4a8e9e4e4e	Update data-formats.md	2021-07-25 22:58:53 -07:00
Adriane Boyd	f5acc48111	Remove TrainablePipe as base class for Lemmatizer in API docs (#8725 )	2021-07-15 16:41:36 +02:00
Sofie Van Landeghem	77859beb99	spacy.ngram_range_suggester.v1 (#8699 )	2021-07-15 10:01:22 +02:00
Ines Montani	50000d37e4	Avoid double parentheses [ci skip]	2021-07-10 10:52:01 +10:00
Calum Sieppert	e2d53aa1a6	Typo fixes	2021-07-09 10:25:56 -06:00
Ines Montani	39c8f7949e	Add code preview for textcat_multilabel [ci skip]	2021-07-08 13:33:25 +10:00
Calum Sieppert	889c187bc2	Typo fixes	2021-07-07 16:53:04 -06:00
Adriane Boyd	6db647dfe0	Update v3.1 usage docs	2021-07-07 08:43:33 +02:00
Sofie Van Landeghem	64fac754fe	add spacy prefix to ngram_suggester.v1 (#8623 )	2021-07-07 08:09:30 +02:00
Sofie Van Landeghem	e7d747e3ee	TransitionBasedParser.v1 to legacy (#8586 ) * TransitionBasedParser.v1 to legacy * register sublayers * bump spacy-legacy to 3.0.7	2021-07-06 15:26:45 +02:00
Ines Montani	04a9ade40f	Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip]	2021-07-06 22:20:24 +10:00
Sofie Van Landeghem	b9f59118bf	Fix silent evaluation (#8581 ) * fix silentness * sneak in docs typo fix * pass silent boolean instead	2021-07-06 14:16:19 +02:00
Adriane Boyd	29906884c5	Raise an error for textcat with <2 labels (#8584 ) * Raise an error for textcat with <2 labels Raise an error if initializing a `textcat` component without at least two labels. * Add similar note to docs * Update positive_label description in API docs	2021-07-06 12:35:22 +02:00
Ines Montani	5bb7fe4b41	Update with HF hub integration [ci skip]	2021-07-06 19:30:59 +10:00
Cass	7d13fc799b	Fix a command typo in models.md "dowmload" -> "download"	2021-07-05 18:44:18 -07:00
Ines Montani	8423864b50	Add docs notes on installing models from Python and in Jupyter [ci skip] (#8597 )	2021-07-05 13:49:20 +02:00
Ines Montani	af9d984407	Merge pull request #8405 from svlandeg/fix/whitespace_tokenizer [ci skip]	2021-06-30 20:52:59 +10:00
Adriane Boyd	41292a1b84	Add note about updating with fill-config	2021-06-29 10:45:36 +02:00
Adriane Boyd	4d1ef8f695	Tidy up docs	2021-06-28 12:08:15 +02:00
Ines Montani	4544412442	Update wording [ci skip] Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-06-25 13:52:48 +10:00
Ines Montani	0d2e2b59bc	Update intro [ci skip]	2021-06-24 22:53:20 +10:00
Matthew Honnibal	f9946154d9	Add SpanCategorizer component (#6747 ) * Draft spancat model * Add spancat model * Add test for extract_spans * Add extract_spans layer * Upd extract_spans * Add spancat model * Add test for spancat model * Upd spancat model * Update spancat component * Upd spancat * Update spancat model * Add quick spancat test * Import SpanCategorizer * Fix SpanCategorizer component * Import SpanGroup * Fix span extraction * Fix import * Fix import * Upd model * Update spancat models * Add scoring, update defaults * Update and add docs * Fix type * Update spacy/ml/extract_spans.py * Auto-format and fix import * Fix comment * Fix type * Fix type * Update website/docs/api/spancategorizer.md * Fix comment Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Better defense Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Fix labels list Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/ml/extract_spans.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/pipeline/spancat.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Set annotations during update * Set annotations in spancat * fix imports in test * Update spacy/pipeline/spancat.py * replace MaxoutLogistic with LinearLogistic * fix config * various small fixes * remove set_annotations parameter in update * use our beloved tupley format with recent support for doc.spans * bugfix to allow renaming the default span_key (scores weren't showing up) * use different key in docs example * change defaults to better-working parameters from project (WIP) * register spacy.extract_spans.v1 for legacy purposes * Upd dev version so can build wheel * layers instead of architectures for smaller building blocks * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Update website/docs/api/spancategorizer.md Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Include additional scores from overrides in combined score weights * Parameterize spans key in scoring Parameterize the `SpanCategorizer` `spans_key` for scoring purposes so that it's possible to evaluate multiple `spancat` components in the same pipeline. * Use the (intentionally very short) default spans key `sc` in the `SpanCategorizer` * Adjust the default score weights to include the default key * Adjust the scorer to use `spans_{spans_key}` as the prefix for the returned score * Revert addition of `attr_name` argument to `score_spans` and adjust the key in the `getter` instead. Note that for `spancat` components with a custom `span_key`, the score weights currently need to be modified manually in `[training.score_weights]` for them to be available during training. To suppress the default score weights `spans_sc_p/r/f` during training, set them to `null` in `[training.score_weights]`. * Update website/docs/api/scorer.md * Fix scorer for spans key containing underscore * Increment version * Add Spans to Evaluate CLI (#8439) * Add Spans to Evaluate CLI * Change to spans_key * Add spans per_type output Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> * Fix spancat GPU issues (#8455) * Fix GPU issues * Require thinc >=8.0.6 * Switch to glorot_uniform_init * Fix and test ngram suggester * Include final ngram in doc for all sizes * Fix ngrams for docs of the same length as ngram size * Handle batches of docs that result in no ngrams * Add tests Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Nirant <NirantK@users.noreply.github.com>	2021-06-24 12:35:27 +02:00
Ines Montani	68721af628	Formatting and preliminary intro [ci skip]	2021-06-24 20:32:23 +10:00
Adriane Boyd	92dc6b409e	Notes on source with vectors	2021-06-24 10:34:07 +02:00
Adriane Boyd	35425d7e26	Add details for Catalan and Danish	2021-06-24 10:10:33 +02:00
Ines Montani	5daf450f51	Update upgrading notes [ci skip]	2021-06-24 18:06:28 +10:00
Ines Montani	528746129d	Merge branch 'master' into docs/new-in-v3-1	2021-06-24 13:11:37 +10:00
Ines Montani	a8e8d02ba7	Merge pull request #8465 from explosion/feature/spacy-package-readme	2021-06-24 13:11:08 +10:00
Ines Montani	3e058dee62	Update features [ci skip]	2021-06-24 12:36:04 +10:00
Ines Montani	40f13c3f0c	Add docs [ci skip]	2021-06-24 11:57:15 +10:00
Ines Montani	a1e4aca267	Fix sentence [ci skip]	2021-06-24 11:40:36 +10:00
Ines Montani	ca0d904faa	Update details [ci skip]	2021-06-23 13:05:56 +10:00
themrmax	d96c422cfc	Fix broken link change /api/registry to /api/top-level#registry	2021-06-22 15:34:06 -07:00
Ines Montani	e9b68d4f4c	Update details and add example [ci skip]	2021-06-22 17:51:03 +10:00
Nick Sorros	31504f5982	Switch model and data path in prodigy project.yml recipe (#8467 )	2021-06-22 09:41:45 +02:00
Ines Montani	bc93c34f54	Add "New in v3.1" guide	2021-06-22 15:23:18 +10:00
Adriane Boyd	e39d1bd4ab	Various docs updates for v3.1 (#8406 ) * Update for Catalan/Italian lemmatizer changes * Add warning about relevance of section	2021-06-21 09:33:50 +02:00
Ines Montani	02d2fdb123	Add link anchor [ci skip]	2021-06-20 11:29:19 +10:00
Matthew Honnibal	6f5e308d17	Support negative examples in partial NER annotations (#8106 ) * Support a cfg field in transition system * Make NER 'has gold' check use right alignment for span * Pass 'negative_samples_key' property into NER transition system * Add field for negative samples to NER transition system * Check neg_key in NER has_gold * Support negative examples in NER oracle * Test for negative examples in NER * Fix name of config variable in NER * Remove vestiges of old-style partial annotation * Remove obsolete tests * Add comment noting lack of support for negative samples in parser * Additions to "neg examples" PR (#8201) * add custom error and test for deprecated format * add test for unlearning an entity * add break also for Begin's cost * add negative_samples_key property on Parser * rename * extend docs & fix some older docs issues * add subclass constructors, clean up tests, fix docs * add flaky test with ValueError if gold parse was not found * remove ValueError if n_gold == 0 * fix docstring * Hack in environment variables to try out training * Remove hack * Remove NER hack, and support 'negative O' samples * Fix O oracle * Fix transition parser * Remove 'not O' from oracle * Fix NER oracle * check for spans in both gold.ents and gold.spans and raise if so, to prevent memory access violation * use set instead of list in consistency check Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-06-17 17:33:00 +10:00
svlandeg	bb9d2f1546	extend example to ensure the text is preserved	2021-06-16 23:56:35 +02:00
Sofie Van Landeghem	e796aab4b3	Resizable textcat (#7862 ) * implement textcat resizing for TextCatCNN * resizing textcat in-place * simplify code * ensure predictions for old textcat labels remain the same after resizing (WIP) * fix for softmax * store softmax as attr * fix ensemble weight copy and cleanup * restructure slightly * adjust documentation, update tests and quickstart templates to use latest versions * extend unit test slightly * revert unnecessary edits * fix typo * ensemble architecture won't be resizable for now * use resizable layer (WIP) * revert using resizable layer * resizable container while avoid shape inference trouble * cleanup * ensure model continues training after resizing * use fill_b parameter * use fill_defaults * resize_layer callback * format * bump thinc to 8.0.4 * bump spacy-legacy to 3.0.6	2021-06-16 11:45:00 +02:00
svlandeg	29d83dec0c	adjust whitespace tokenizer to avoid sep in split()	2021-06-16 10:58:45 +02:00
Adriane Boyd	5646fcbe46	Merge remote-tracking branch 'upstream/develop' into chore/develop-into-master-v3.1	2021-06-15 15:05:17 +02:00
Sofie Van Landeghem	0fd0d949c4	fix 's typo's across code base (#8384 )	2021-06-15 10:57:08 +02:00
Adriane Boyd	507422149f	Various docs updates for v3.0 (#8353 ) * Update cats score names in Scorer API docs * Refer to performance in meta * Update package naming/versions, lemmatizer details * Minor formatting fixes * Provide more explanation for cats_score_desc * Provide language-specific lemmatizer defaults in API docs Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>	2021-06-14 12:19:36 +02:00
Sofie Van Landeghem	3c58c0323f	fix docs (#8200 )	2021-05-27 10:48:59 +02:00
Paul O'Leary McCann	0c553ecd4e	Fix docs (fix #8189 )	2021-05-24 19:47:30 +09:00
Sofie Van Landeghem	202943bc8c	KB & NEL to/from bytes (#8113 ) * unit test for pickling KB * add pickling test for NEL * KB to_bytes and from_bytes * NEL to_bytes and from_bytes * xfail pickle tests for now * fix docs * cleanup	2021-05-20 18:11:30 +10:00
Adriane Boyd	6baab565eb	Minor updates to quickstart settings/instructions (#7965 ) * Minor updates to quickstart settings/instructions * set default value of textcat exclusive to `false` until the default checkbox behavior is updated * add the `morphologizer` to the list of components * add a note that v3.0.6+ is required * Switch to warning above quickstart * Undo changes to textcat default in quickstart Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-05-17 16:55:22 +02:00
Adriane Boyd	71c2a3ab47	Fix new version for match_alignments (#8021 )	2021-05-07 09:55:20 +02:00
Sofie Van Landeghem	02a6a5fea0	Fix 'debug model' for transformers + generalize (#7973 ) * add overrides to docs * fix debug model with transformer * assume training data is set in config	2021-05-06 18:43:32 +10:00
Paul O'Leary McCann	66bfabd839	Fix pretraining objectives fragment (#8005 ) * Fix pretraining objectives fragment The fragment here is reused from a heading higher up, so you couldn't link to this section. * Fix section link to new fragment	2021-05-06 08:27:36 +02:00
Adriane Boyd	2320791f6d	Fix Transformer.initialize example (#7963 )	2021-04-30 12:21:31 +02:00
Adriane Boyd	95c0833656	Add training option to set annotations on update (#7767 ) * Add training option to set annotations on update Add a `[training]` option called `set_annotations_on_update` to specify a list of components for which the predicted annotations should be set on `example.predicted` immediately after that component has been updated. The predicted annotations can be accessed by later components in the pipeline during the processing of the batch in the same `update` call. * Rename to annotates / annotating_components * Add test for `annotating_components` when training from config * Add documentation	2021-04-26 16:53:53 +02:00
Adriane Boyd	bdb485cc80	Add callback to copy vocab/tokenizer from model (#7750 ) * Add callback to copy vocab/tokenizer from model Add callback `spacy.copy_from_base_model.v1` to copy the tokenizer settings and/or vocab (including vectors) from a base model. * Move spacy.copy_from_base_model.v1 to spacy.training.callbacks * Add documentation * Modify to specify model as tokenizer and vocab params	2021-04-22 12:36:50 +02:00
Adriane Boyd	f68fc29130	Update sent_starts in Example.from_dict (#7847 ) * Update sent_starts in Example.from_dict Update `sent_starts` for `Example.from_dict` so that `Optional[bool]` values have the same meaning as for `Token.is_sent_start`. Use `Optional[bool]` as the type for sent start values in the docs. * Use helper function for conversion to ternary ints	2021-04-22 11:32:45 +02:00
Adriane Boyd	d2bdaa7823	Replace negative rows with 0 in StaticVectors (#7674 ) * Replace negative rows with 0 in StaticVectors Replace negative row indices with 0-vectors in `StaticVectors`. * Increase versions related to StaticVectors * Increase versions of all architctures and layers related to `StaticVectors` * Improve efficiency of 0-vector operations Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5 * Update config defaults to new versions * Update docs	2021-04-22 18:04:15 +10:00
Sofie Van Landeghem	6f565cf39d	fix typo in entity_linker docs	2021-04-22 09:59:24 +02:00
Sofie Van Landeghem	2e746dbf32	update EL training data format in docs (#7839 ) * update EL training data format * fix typo * all -1 because reasons	2021-04-22 08:50:09 +02:00
Shantam Raj	6017fcf693	Default code for Setting Entity annotations on the website errors (#7738 ) * the default example for "Setting entity annotations" errors on Binder * updating contributer info * using a new variable to store original entities	2021-04-21 09:16:32 +02:00
langdonholmes	df541c6b5e	Update processing-pipelines.md to mention method for doc metadata (#7480 ) * Update processing-pipelines.md Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True) Link to a new example on the attributes page detailing the following: > ``` > data = [ > ("Some text to process", {"meta": "foo"}), > ("And more text...", {"meta": "bar"}) > ] > > for doc, context in nlp.pipe(data, as_tuples=True): > # Let's assume you have a "meta" extension registered on the Doc > doc._.meta = context["meta"] > ``` from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as * Updating the attributes section Update the attributes section with example of how extensions can be used to store metadata. * Update processing-pipelines.md * Update processing-pipelines.md Made as_tuples example executable and relocated to the end of the "Processing Text" section. * Update processing-pipelines.md * Update processing-pipelines.md Removed extra line * Reformat and rephrase Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-19 11:58:12 +02:00
Adriane Boyd	0e7f94b247	Update Tokenizer.explain with special matches (#7749 ) * Update Tokenizer.explain with special matches Update `Tokenizer.explain` and the pseudo-code in the docs to include the processing of special cases that contain affixes or whitespace. * Handle optional settings in explain * Add test for special matches in explain Add test for `Tokenizer.explain` for special cases containing affixes.	2021-04-19 19:08:20 +10:00
Sofie Van Landeghem	c786e98e56	assemble CLI command (#7783 ) * assemble CLI command * ensure assemble runs even without training section * cleanup	2021-04-19 18:39:11 +10:00
Bram Vanroy	ed561cf428	Terminology: deprecated vs obsolete (#7621 ) * Terminology: deprecated vs obsolete Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works. In light of this, perhaps all other error codes should be checked as well. * clarify that the link command is removed and not just deprecated Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-12 14:37:00 +02:00
Adriane Boyd	673e2bc4c0	Add usage docs for streamed train corpora (#7693 )	2021-04-09 16:15:38 +02:00
Sofie Van Landeghem	204c2f116b	Extend score_spans for overlapping & non-labeled spans (#7209 ) * extend span scorer with consider_label and allow_overlap * unit test for spans y2x overlap * add score_spans unit test * docs for new fields in scorer.score_spans * rename to include_label * spell out if-else for clarity * rename to 'labeled' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-08 12:19:17 +02:00
broaddeep	ee159b8543	Support match alignments (#7321 ) * Support match alignments * change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case * remove added errors, utilize bint type, cleanup whitespace * fix no new line in end of file * Minor formatting * Skip alignments processing if as_spans is set * Add with_alignments to Matcher API docs * Update website/docs/api/matcher.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-04-08 18:10:14 +10:00
Ines Montani	de4f4c9b8a	Add more link anchors [ci skip]	2021-04-06 14:15:21 +10:00
Ines Montani	5bbdd7dc4c	Update pipeline design docs [ci skip]	2021-04-06 14:13:22 +10:00
Ines Montani	1d1cfadbca	Fix formatting [ci skip]	2021-04-06 14:13:13 +10:00
Ayush Chaurasia	3c2ce41dd8	W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429 ) * Add optional artifacts logging * Update docs * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Bump WandbLogger Version * Add documentation of v1 to legacy docs * bump spacy-legacy to 3.0.2 (to be released) Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-01 19:36:23 +02:00
Sofie Van Landeghem	59c2069eb1	Legacy docs (#7601 ) * document legacy Tok2Vec architectures * add TextCatEnsemble.v1 legacy documentation * Separate legacy section in side bar	2021-03-30 12:43:14 +02:00
Santiago Castro	af07fc3bc1	Add support for CUDA 11.2 (#7583 ) * Add support for CUDA 11.2 * Update the docs * Format Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán	5b4dde38a3	fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606 )	2021-03-30 09:45:49 +02:00
Ines Montani	be55f43163	Merge pull request #7473 from adrianeboyd/docs/v3-pipeline-deps-order	2021-03-22 12:43:07 +01:00
Ines Montani	3ee2fcfba0	Merge pull request #7483 from adrianeboyd/docs/various-v3-4 [ci skip]	2021-03-22 12:37:06 +01:00
Ines Montani	88e5a0dc16	Merge pull request #7504 from polm/fix/lexeme-docs [ci skip] Fix mismatched backtick in Lexeme docs	2021-03-22 12:36:44 +01:00
Adriane Boyd	0d2b723e8d	Update entity setting section	2021-03-20 11:38:55 +01:00
Paul O'Leary McCann	e39c0dcf33	Fix mismatched backtick in Lexeme docs	2021-03-20 18:40:00 +09:00
Adriane Boyd	c771ec22f0	Update matcher errors and docs * Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for `Matcher` and `PhraseMatcher` * Document `Matcher.__call__(allow_missing=)`	2021-03-19 10:11:18 +01:00
Adriane Boyd	6a9a467766	Update website/docs/usage/processing-pipelines.md Co-authored-by: Ines Montani <ines@ines.io>	2021-03-19 08:12:49 +01:00
Adriane Boyd	6354b642c5	Fix typo	2021-03-18 19:01:10 +01:00
Adriane Boyd	40e5d3a980	Update saving/loading example	2021-03-18 16:56:10 +01:00
Adriane Boyd	0fb1881f36	Reformat processing pipelines	2021-03-18 13:31:42 +01:00
Adriane Boyd	acc58719da	Update custom similarity hooks example	2021-03-18 13:31:42 +01:00
Adriane Boyd	c9e1a9ac17	Add multiprocessing section	2021-03-18 13:31:42 +01:00
Adriane Boyd	9a254d3995	Include all en_core_web_sm components in examples	2021-03-18 13:31:42 +01:00
Adriane Boyd	83c1b919a7	Fix positional/option in CLI types	2021-03-18 13:31:42 +01:00
Adriane Boyd	9fd41d6742	Remove Language.pipe cleanup arg	2021-03-18 13:31:42 +01:00
Adriane Boyd	5da323fd86	Minor edits	2021-03-17 12:59:05 +01:00
Adriane Boyd	a5ffe8dfed	Add details about pretrained pipeline design	2021-03-17 11:31:26 +01:00
bsweileh	61472e7cb3	Update _training.md - Fix broken link on backpropagation (#7431 ) * Update _training.md Fix broken link on backpropagation * Add agreement add spacy contributor agreement	2021-03-15 09:21:35 +01:00
Ines Montani	c67d5a6eb0	Merge pull request #7394 from adrianeboyd/docs/ner-example-data-readme	2021-03-13 04:26:18 +01:00
Ines Montani	068b97a617	Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only	2021-03-13 04:25:50 +01:00
Adriane Boyd	3168103605	Fix type of spacy train --output in docs	2021-03-12 10:04:57 +01:00
Adriane Boyd	03e9e7b567	Add --code option to init fill-config	2021-03-12 10:03:57 +01:00
Adriane Boyd	124304b146	Add vocab kwarg back to spacy.load * Additional minor formatting and docs cleanup	2021-03-11 10:58:59 +01:00
Adriane Boyd	84470d9b9e	Incorporate BILUO note from #7407	2021-03-11 10:11:21 +01:00
Adriane Boyd	4294bcf4ab	Align keyword-only in docs for init/util	2021-03-11 09:52:40 +01:00
Adriane Boyd	28726c25a1	Update docs for convert CLI and NER examples	2021-03-10 11:42:02 +01:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Sofie Van Landeghem	932887b950	textcat scoring fix and multi_label docs (#6974 ) * add multi-label textcat to menu * add infobox on textcat API * add info to v3 migration guide * small edits * further fixes in doc strings * add infobox to textcat architectures * add textcat_multilabel to overview of built-in components * spelling * fix unrelated warn msg * Add textcat_multilabel to quickstart [ci skip] * remove separate documentation page for multilabel_textcategorizer * small edits * positive label clarification * avoid duplicating information in self.cfg and fix textcat.score * fix multilabel textcat too * revert threshold to storage in cfg * revert threshold stuff for multi-textcat Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:04:22 +11:00
Sofie Van Landeghem	cd70c3cb79	Fixing pretrain (#7342 ) * initialize NLP with train corpus * add more pretraining tests * more tests * function to fetch tok2vec layer for pretraining * clarify parameter name * test different objectives * formatting * fix check for static vectors when using vectors objective * clarify docs * logger statement * fix init_tok2vec and proc.initialize order * test training after pretraining * add init_config tests for pretraining * pop pretraining block to avoid config validation errors * custom errors	2021-03-09 14:01:13 +11:00
Ines Montani	dfb23a419e	Merge branch 'spacy.io' [ci skip]	2021-03-06 17:38:54 +11:00
graue70	7d085d5b1c	Fix typo in docs	2021-03-05 18:30:09 +01:00
svlandeg	682a6232e3	fix typo	2021-03-02 17:59:13 +01:00
svlandeg	d900c55061	consistently use registry as callable	2021-03-02 17:56:28 +01:00
graue70	0fddc0447c	Fix copy & paste error in API docs	2021-03-02 14:00:14 +01:00
Ines Montani	8f7c7b2658	Merge pull request #7211 from svlandeg/docs/el_update [ci skip] kb.get_candidates renamed to get_alias_candidates	2021-02-27 11:51:22 +11:00
Ines Montani	408b94887a	Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip] Extend docs related to Vocab.get_noun_chunks	2021-02-27 11:51:08 +11:00
svlandeg	248339039e	fix type in docs	2021-02-26 14:27:10 +01:00
svlandeg	08fd901a1b	kb.get_candidates renamed to get_alias_candidates	2021-02-25 20:09:36 +01:00
Adriane Boyd	6a37f343d5	Extend docs related to Vocab.get_noun_chunks	2021-02-25 16:38:21 +01:00
Ines Montani	24cecbb3f4	Merge pull request #7126 from adrianeboyd/docs/gpu-id-opt [ci skip] Add tip about --gpu-id to training quickstart	2021-02-24 22:34:17 +11:00
Ken	fa7ddc7f88	Update sentencizer documentation example with sentencizer pipe name (#7185 )	2021-02-24 08:06:54 +01:00
Tocic	b1996a51a1	fix typo in models.md (#7157 )	2021-02-22 09:00:38 +01:00
Sofie Van Landeghem	b92f81d5da	fix NEL config and IO, and n_sents functionality (#7100 ) * fix NEL config and IO, and n_sents functionality * add docs * fix test	2021-02-22 14:49:52 +11:00
Sofie Van Landeghem	ba5a50f62b	NEL docs & UX (#7129 ) * EL set_kb docs fix * custom warning for set_kb mistake	2021-02-22 11:04:22 +11:00
Adriane Boyd	7198be0f4b	Add tip about --gpu-id to training quickstart	2021-02-19 14:07:51 +01:00
Sofie Van Landeghem	709c9e75af	span.ent only returns first sentence (#7084 ) * return first sentence when span contains sentence boundary * docs fix * small fixes * cleanup	2021-02-19 23:02:38 +11:00
palandlom	9b82586699	var batch is useless (#7111 ) It seems that nlp.update(examples) should be nlp.update(batch)	2021-02-18 09:44:22 +01:00
Ines Montani	fc4fb6eb3a	Make v2.x docs more prominent [ci skip]	2021-02-17 23:42:27 +11:00
Ines Montani	6b9026a219	Merge pull request #7000 from explosion/feature/project-yml-overrides Support env vars and CLI overrides for project.yml	2021-02-11 12:31:45 +11:00
Peter Baumann	61b04a70d5	Run PhraseMatcher on Spans (#6918 ) * Add regression test * Run PhraseMatcher on Spans * Add test for PhraseMatcher on Spans and Docs * Add SCA * Add test with 3 matches in Doc, 1 match in Span * Update docs * Use doc.length for find_matches in tokenizer Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-02-10 23:43:32 +11:00
Ines Montani	c08b3f294c	Support env vars and CLI overrides for project.yml	2021-02-10 13:45:27 +11:00
svlandeg	9a7f33c916	final 3.0 benchmark numbers	2021-02-09 21:28:33 +01:00
Ines Montani	ca3f8386d7	Merge pull request #6975 from svlandeg/fix/link [ci skip] fix link	2021-02-09 14:34:11 +11:00
tarskiandhutch	e897e7aaad	Line 70: syntax error Original config definition treated dictionary key as a function argument.	2021-02-08 15:24:57 -05:00
svlandeg	bb7482bef8	fix link	2021-02-08 18:39:59 +01:00
Sofie Van Landeghem	6ed423c16c	reduce memory load when reading all vectors from file (#6945 ) * reduce memory load when reading all vectors from file * one more small typo fix	2021-02-07 08:05:43 +08:00
Ines Montani	433835d9b0	Merge pull request #6889 from adrianeboyd/docs/source-install-dup [ci skip]	2021-02-05 13:35:16 +11:00
svlandeg	7cda5605a0	add type	2021-02-03 13:13:58 +01:00
svlandeg	94929c2b98	small doc fixes	2021-02-03 13:10:22 +01:00
Ines Montani	2cdfcd2d19	Update naming [ci skip]	2021-02-03 12:48:31 +11:00
Adriane Boyd	37a68a06ab	Update to recommend editable installs for source installs	2021-02-02 16:51:27 +01:00
Adriane Boyd	3a3e4daf60	Update install instructions * Remove duplicate section about compiling from source	2021-02-02 14:44:15 +01:00
Pengcheng YIN	6fdc33203a	Fix a typo	2021-02-01 17:26:28 -05:00
Ines Montani	a59f3fcf5d	Make wheel the default format and update docs [ci skip]	2021-02-01 23:18:43 +11:00
Ines Montani	31b842d6ce	Update table [ci skip]	2021-02-01 14:17:52 +11:00
Ines Montani	7752f80f39	Update docs [ci skip]	2021-01-31 16:11:24 +11:00
Ines Montani	a8a1231ccd	Update README and docs [ci skip]	2021-01-31 12:36:04 +11:00
Ines Montani	45c551037d	Update CLI docs [ci skip]	2021-01-30 21:50:23 +11:00
Ines Montani	ae07416fda	Merge branch 'website/v3-launch' into develop	2021-01-30 20:31:06 +11:00
Ines Montani	2332c4280b	Update and use unified --build option	2021-01-30 13:11:36 +11:00
Ines Montani	2609ba4e89	Support building wheel in spacy package	2021-01-30 11:54:02 +11:00
Ines Montani	95e958a229	Merge pull request #6852 from explosion/feature/replace-listeners	2021-01-30 00:58:08 +11:00
Ines Montani	7694f76dd1	Update warning and mention replace_listeners	2021-01-29 23:46:01 +11:00
Adriane Boyd	8b76cb8095	Rephrase transformers PyTorch instructions	2021-01-29 13:36:56 +01:00
Ines Montani	095055ac48	Merge pull request #6855 from adrianeboyd/docs/trf-sentencepiece [ci skip] Update transfomers install docs	2021-01-29 23:34:01 +11:00
Adriane Boyd	e3e87e7275	Update transfomers install docs * Recommend installing PyTorch separately * Add instructions for `sentencepiece`	2021-01-29 13:27:43 +01:00
Ines Montani	e766e8c56d	Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-01-29 21:41:17 +11:00
svlandeg	d7d838281c	adding new="3" mentions in the doc	2021-01-29 11:26:37 +01:00
Ines Montani	99af9e7125	Update documentation	2021-01-29 18:45:48 +11:00
Sofie Van Landeghem	24a697abb8	avoid empty aliases and improve UX and docs (#6840 )	2021-01-29 08:51:40 +08:00
Sofie Van Landeghem	837a4f53c2	Error handling in nlp.pipe (#6817 ) * add error handler for pipe methods * add unit tests * remove pipe method that are the same as their base class * have Language keep track of a default error handler * cleanup * formatting * small refactor * add documentation	2021-01-29 08:51:21 +08:00
Ines Montani	230e651ad6	Merge branch 'develop' into master-tmp	2021-01-27 13:26:29 +11:00
Ines Montani	634ae609b4	Adjust formatting [ci skip]	2021-01-27 13:08:00 +11:00
Ines Montani	5d79d1af50	Merge pull request #6796 from svlandeg/docs/benchmarks [ci skip]	2021-01-27 13:01:23 +11:00
Ines Montani	1ed7029d47	Update website for v3 launch	2021-01-27 12:39:47 +11:00
Adriane Boyd	c447aa2b98	Update --code arg in evaluate CLI docs	2021-01-26 15:30:46 +01:00
jganseman	907bce7a78	Merge pull request #1 from jganseman/patch-1 Patch 1	2021-01-26 11:12:30 +01:00
jganseman	8bc57ec372	also update is_oov in lexeme docs	2021-01-26 11:09:16 +01:00
jganseman	1f2b0ec168	proposing a more concise explanation for is_oov proposing a more concise explanation for is_oov	2021-01-26 10:53:39 +01:00
Matthew Honnibal	f049df1715	Revert "Set annotations in update" (#6810 ) * Revert "Set annotations in update (#6767)" This reverts commit `e680efc7cc`. * Fix version * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/transition_parser.pyx * Update spacy/pipeline/transition_parser.pyx * Update website/docs/api/multilabel_textcategorizer.md * Update website/docs/api/tok2vec.md * Update website/docs/usage/layers-architectures.md * Update website/docs/usage/layers-architectures.md * Update website/docs/api/transformer.md * Update website/docs/api/textcategorizer.md * Update website/docs/api/tagger.md * Update spacy/pipeline/entity_linker.py * Update website/docs/api/sentencerecognizer.md * Update website/docs/api/pipe.md * Update website/docs/api/morphologizer.md * Update website/docs/api/entityrecognizer.md * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/multitask.pyx * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/textcat.py * Update spacy/pipeline/textcat.py * Update spacy/pipeline/textcat.py * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/trainable_pipe.pyx * Update spacy/pipeline/trainable_pipe.pyx * Update spacy/pipeline/transition_parser.pyx * Update spacy/pipeline/transition_parser.pyx * Update website/docs/api/entitylinker.md * Update website/docs/api/dependencyparser.md * Update spacy/pipeline/trainable_pipe.pyx	2021-01-25 22:18:45 +08:00
Adriane Boyd	61c9f8bf24	Remove transformers model max length section (#6807 )	2021-01-25 19:59:34 +08:00
svlandeg	56064faed9	update caption	2021-01-23 00:57:00 +01:00
svlandeg	d7c0f40a96	update comment	2021-01-22 18:55:18 +01:00
svlandeg	a071279bc7	add speed comparison to docs	2021-01-22 18:46:35 +01:00
svlandeg	b132cb3036	update accuracies for new a1 models	2021-01-21 20:24:05 +01:00
Adriane Boyd	d0236136a2	Fix default config init in Transformer API docs (#6781 )	2021-01-21 23:18:03 +08:00
Sofie Van Landeghem	e680efc7cc	Set annotations in update (#6767 ) * bump to 3.0.0rc4 * do set_annotations in component update calls * update docs and remove set_annotations flag * fix EL test	2021-01-20 11:49:25 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Ines Montani	4a1029a9b6	Add infobox [ci skip]	2021-01-19 19:18:39 +11:00
Ines Montani	f50502dad7	Update docs [ci skip]	2021-01-19 00:22:47 +11:00
Sofie Van Landeghem	fed8f48965	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 ) * raise NotImplementedError when noun_chunks iterator is not implemented * bring back, fix and document span.noun_chunks * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-17 19:56:05 +08:00
Adriane Boyd	bf0cdae8d4	Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory	2021-01-17 19:54:41 +08:00
Adriane Boyd	9328dd5625	Handle unset token.morph in Morphologizer (#6704 ) * Handle unset token.morph in Morphologizer Handle unset `token.morph` in `Morphologizer.initialize` and `Morphologizer.get_loss`. If both `token.morph` and `token.pos` are unset, treat the annotation as missing rather than empty. * Add token.has_morph()	2021-01-15 17:20:10 +01:00
Adriane Boyd	0c936004d1	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3	2021-01-14 11:49:58 +01:00
Matthew Honnibal	f277bfdf0f	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 ) * Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>	2021-01-14 17:30:41 +11:00
Adriane Boyd	a45d89f09a	Add initialize.before_init and after_init callbacks Add `initialize.before_init` and `initialize.after_init` callbacks to the config. The `initialize.before_init` callback is a place to implement one-time tokenizer customizations that are then saved with the model.	2021-01-12 13:07:44 +01:00
Sofie Van Landeghem	a612a5ba3f	fix small typos (#6698 )	2021-01-08 09:39:47 +01:00
Sofie Van Landeghem	75d9019343	Fix types of Tok2Vec encoding architectures (#6442 ) * fix TorchBiLSTMEncoder documentation * ensure the types of the encoding Tok2vec layers are correct * update references from v1 to v2 for the new architectures	2021-01-07 16:39:27 +11:00
Sofie Van Landeghem	82ae95267a	Docs for pretrain architectures (#6605 ) * document pretraining architectures * formatting * bit more info * small fixes	2021-01-06 16:12:30 +11:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Ines Montani	6f83abb971	Merge pull request #6647 from svlandeg/feature/init_config_overwrite	2021-01-05 14:59:04 +11:00
Ines Montani	3614472e29	Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]	2021-01-05 13:52:49 +11:00
Ines Montani	9c078a5885	Update formatting for consistency [ci skip]	2021-01-05 13:52:28 +11:00
Ines Montani	a9e845426f	Use --force for consistency and add docs	2021-01-05 13:49:59 +11:00
svlandeg	d5ff0fecf8	add docs	2020-12-30 14:01:13 +01:00
svlandeg	2fa23b0304	fix capitalization for link	2020-12-29 15:01:22 +01:00
svlandeg	43cc6aea93	remove non-existing link	2020-12-29 14:59:39 +01:00
svlandeg	543073bf9d	add pretrain example	2020-12-29 14:51:23 +01:00
svlandeg	1d0ef98873	move example	2020-12-29 14:46:03 +01:00
svlandeg	20113b8063	add train CLI example	2020-12-29 14:44:56 +01:00
Sofie Van Landeghem	87562e470d	fix backticks in docs (#6635 )	2020-12-27 22:12:37 +01:00

... 2 3 4 5 6 ...

1735 Commits