spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-26 09:56:28 +03:00

Author	SHA1	Message	Date
Adriane Boyd	d2bdaa7823	Replace negative rows with 0 in StaticVectors (#7674 ) * Replace negative rows with 0 in StaticVectors Replace negative row indices with 0-vectors in `StaticVectors`. * Increase versions related to StaticVectors * Increase versions of all architctures and layers related to `StaticVectors` * Improve efficiency of 0-vector operations Parallel `spacy-legacy` PR: https://github.com/explosion/spacy-legacy/pull/5 * Update config defaults to new versions * Update docs	2021-04-22 18:04:15 +10:00
Sofie Van Landeghem	6f565cf39d	fix typo in entity_linker docs	2021-04-22 09:59:24 +02:00
Sofie Van Landeghem	2e746dbf32	update EL training data format in docs (#7839 ) * update EL training data format * fix typo * all -1 because reasons	2021-04-22 08:50:09 +02:00
Shantam Raj	6017fcf693	Default code for Setting Entity annotations on the website errors (#7738 ) * the default example for "Setting entity annotations" errors on Binder * updating contributer info * using a new variable to store original entities	2021-04-21 09:16:32 +02:00
langdonholmes	df541c6b5e	Update processing-pipelines.md to mention method for doc metadata (#7480 ) * Update processing-pipelines.md Under "things to try," inform users they can save metadata when using nlp.pipe(foobar, as_tuples=True) Link to a new example on the attributes page detailing the following: > ``` > data = [ > ("Some text to process", {"meta": "foo"}), > ("And more text...", {"meta": "bar"}) > ] > > for doc, context in nlp.pipe(data, as_tuples=True): > # Let's assume you have a "meta" extension registered on the Doc > doc._.meta = context["meta"] > ``` from https://stackoverflow.com/questions/57058798/make-spacy-nlp-pipe-process-tuples-of-text-and-additional-information-to-add-as * Updating the attributes section Update the attributes section with example of how extensions can be used to store metadata. * Update processing-pipelines.md * Update processing-pipelines.md Made as_tuples example executable and relocated to the end of the "Processing Text" section. * Update processing-pipelines.md * Update processing-pipelines.md Removed extra line * Reformat and rephrase Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-19 11:58:12 +02:00
Adriane Boyd	0e7f94b247	Update Tokenizer.explain with special matches (#7749 ) * Update Tokenizer.explain with special matches Update `Tokenizer.explain` and the pseudo-code in the docs to include the processing of special cases that contain affixes or whitespace. * Handle optional settings in explain * Add test for special matches in explain Add test for `Tokenizer.explain` for special cases containing affixes.	2021-04-19 19:08:20 +10:00
Sofie Van Landeghem	c786e98e56	assemble CLI command (#7783 ) * assemble CLI command * ensure assemble runs even without training section * cleanup	2021-04-19 18:39:11 +10:00
Bram Vanroy	ed561cf428	Terminology: deprecated vs obsolete (#7621 ) * Terminology: deprecated vs obsolete Typically, deprecated is used for functionality that is bound to become unavailable but that can still be used. Obsolete is used for features that have been removed. In E941, I think what is meant is "obsolete" since loading a model by a shortcut simply does not work anymore (and throws an error). This is different from downloading a model with a shortcut, which is deprecated but still works. In light of this, perhaps all other error codes should be checked as well. * clarify that the link command is removed and not just deprecated Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-12 14:37:00 +02:00
Adriane Boyd	673e2bc4c0	Add usage docs for streamed train corpora (#7693 )	2021-04-09 16:15:38 +02:00
Sofie Van Landeghem	204c2f116b	Extend score_spans for overlapping & non-labeled spans (#7209 ) * extend span scorer with consider_label and allow_overlap * unit test for spans y2x overlap * add score_spans unit test * docs for new fields in scorer.score_spans * rename to include_label * spell out if-else for clarity * rename to 'labeled' Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-04-08 12:19:17 +02:00
broaddeep	ee159b8543	Support match alignments (#7321 ) * Support match alignments * change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case * remove added errors, utilize bint type, cleanup whitespace * fix no new line in end of file * Minor formatting * Skip alignments processing if as_spans is set * Add with_alignments to Matcher API docs * Update website/docs/api/matcher.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-04-08 18:10:14 +10:00
Ines Montani	de4f4c9b8a	Add more link anchors [ci skip]	2021-04-06 14:15:21 +10:00
Ines Montani	5bbdd7dc4c	Update pipeline design docs [ci skip]	2021-04-06 14:13:22 +10:00
Ines Montani	1d1cfadbca	Fix formatting [ci skip]	2021-04-06 14:13:13 +10:00
Ayush Chaurasia	3c2ce41dd8	W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429 ) * Add optional artifacts logging * Update docs * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update spacy/training/loggers.py Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Bump WandbLogger Version * Add documentation of v1 to legacy docs * bump spacy-legacy to 3.0.2 (to be released) Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>	2021-04-01 19:36:23 +02:00
Sofie Van Landeghem	59c2069eb1	Legacy docs (#7601 ) * document legacy Tok2Vec architectures * add TextCatEnsemble.v1 legacy documentation * Separate legacy section in side bar	2021-03-30 12:43:14 +02:00
Santiago Castro	af07fc3bc1	Add support for CUDA 11.2 (#7583 ) * Add support for CUDA 11.2 * Update the docs * Format Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-03-30 09:47:33 +02:00
Álvaro Abella Bascarán	5b4dde38a3	fix fn name: tokenizer.infixes_finditer -> tokenizer.infix_finditer (#7606 )	2021-03-30 09:45:49 +02:00
Ines Montani	be55f43163	Merge pull request #7473 from adrianeboyd/docs/v3-pipeline-deps-order	2021-03-22 12:43:07 +01:00
Ines Montani	3ee2fcfba0	Merge pull request #7483 from adrianeboyd/docs/various-v3-4 [ci skip]	2021-03-22 12:37:06 +01:00
Ines Montani	88e5a0dc16	Merge pull request #7504 from polm/fix/lexeme-docs [ci skip] Fix mismatched backtick in Lexeme docs	2021-03-22 12:36:44 +01:00
Adriane Boyd	0d2b723e8d	Update entity setting section	2021-03-20 11:38:55 +01:00
Paul O'Leary McCann	e39c0dcf33	Fix mismatched backtick in Lexeme docs	2021-03-20 18:40:00 +09:00
Adriane Boyd	c771ec22f0	Update matcher errors and docs * Mention `tagger+attribute_ruler` in `POS`/`MORPH` error messages for `Matcher` and `PhraseMatcher` * Document `Matcher.__call__(allow_missing=)`	2021-03-19 10:11:18 +01:00
Adriane Boyd	6a9a467766	Update website/docs/usage/processing-pipelines.md Co-authored-by: Ines Montani <ines@ines.io>	2021-03-19 08:12:49 +01:00
Adriane Boyd	6354b642c5	Fix typo	2021-03-18 19:01:10 +01:00
Adriane Boyd	40e5d3a980	Update saving/loading example	2021-03-18 16:56:10 +01:00
Adriane Boyd	0fb1881f36	Reformat processing pipelines	2021-03-18 13:31:42 +01:00
Adriane Boyd	acc58719da	Update custom similarity hooks example	2021-03-18 13:31:42 +01:00
Adriane Boyd	c9e1a9ac17	Add multiprocessing section	2021-03-18 13:31:42 +01:00
Adriane Boyd	9a254d3995	Include all en_core_web_sm components in examples	2021-03-18 13:31:42 +01:00
Adriane Boyd	83c1b919a7	Fix positional/option in CLI types	2021-03-18 13:31:42 +01:00
Adriane Boyd	9fd41d6742	Remove Language.pipe cleanup arg	2021-03-18 13:31:42 +01:00
Adriane Boyd	5da323fd86	Minor edits	2021-03-17 12:59:05 +01:00
Adriane Boyd	a5ffe8dfed	Add details about pretrained pipeline design	2021-03-17 11:31:26 +01:00
bsweileh	61472e7cb3	Update _training.md - Fix broken link on backpropagation (#7431 ) * Update _training.md Fix broken link on backpropagation * Add agreement add spacy contributor agreement	2021-03-15 09:21:35 +01:00
Ines Montani	c67d5a6eb0	Merge pull request #7394 from adrianeboyd/docs/ner-example-data-readme	2021-03-13 04:26:18 +01:00
Ines Montani	068b97a617	Merge pull request #7408 from adrianeboyd/bugfix/load-keyword-only	2021-03-13 04:25:50 +01:00
Adriane Boyd	3168103605	Fix type of spacy train --output in docs	2021-03-12 10:04:57 +01:00
Adriane Boyd	03e9e7b567	Add --code option to init fill-config	2021-03-12 10:03:57 +01:00
Adriane Boyd	124304b146	Add vocab kwarg back to spacy.load * Additional minor formatting and docs cleanup	2021-03-11 10:58:59 +01:00
Adriane Boyd	84470d9b9e	Incorporate BILUO note from #7407	2021-03-11 10:11:21 +01:00
Adriane Boyd	4294bcf4ab	Align keyword-only in docs for init/util	2021-03-11 09:52:40 +01:00
Adriane Boyd	28726c25a1	Update docs for convert CLI and NER examples	2021-03-10 11:42:02 +01:00
Adriane Boyd	d746ea6278	Add warning about GPU selection in Jupyter notebooks (#7075 ) * Initial warning * Update check * Redo edit * Move jupyter warning to helper method * Add link with details to warnings	2021-03-09 15:35:21 +01:00
Sofie Van Landeghem	932887b950	textcat scoring fix and multi_label docs (#6974 ) * add multi-label textcat to menu * add infobox on textcat API * add info to v3 migration guide * small edits * further fixes in doc strings * add infobox to textcat architectures * add textcat_multilabel to overview of built-in components * spelling * fix unrelated warn msg * Add textcat_multilabel to quickstart [ci skip] * remove separate documentation page for multilabel_textcategorizer * small edits * positive label clarification * avoid duplicating information in self.cfg and fix textcat.score * fix multilabel textcat too * revert threshold to storage in cfg * revert threshold stuff for multi-textcat Co-authored-by: Ines Montani <ines@ines.io>	2021-03-09 23:04:22 +11:00
Sofie Van Landeghem	cd70c3cb79	Fixing pretrain (#7342 ) * initialize NLP with train corpus * add more pretraining tests * more tests * function to fetch tok2vec layer for pretraining * clarify parameter name * test different objectives * formatting * fix check for static vectors when using vectors objective * clarify docs * logger statement * fix init_tok2vec and proc.initialize order * test training after pretraining * add init_config tests for pretraining * pop pretraining block to avoid config validation errors * custom errors	2021-03-09 14:01:13 +11:00
Ines Montani	dfb23a419e	Merge branch 'spacy.io' [ci skip]	2021-03-06 17:38:54 +11:00
graue70	7d085d5b1c	Fix typo in docs	2021-03-05 18:30:09 +01:00
svlandeg	682a6232e3	fix typo	2021-03-02 17:59:13 +01:00
svlandeg	d900c55061	consistently use registry as callable	2021-03-02 17:56:28 +01:00
graue70	0fddc0447c	Fix copy & paste error in API docs	2021-03-02 14:00:14 +01:00
Ines Montani	8f7c7b2658	Merge pull request #7211 from svlandeg/docs/el_update [ci skip] kb.get_candidates renamed to get_alias_candidates	2021-02-27 11:51:22 +11:00
Ines Montani	408b94887a	Merge pull request #7207 from adrianeboyd/docs/get-noun-chunks [ci skip] Extend docs related to Vocab.get_noun_chunks	2021-02-27 11:51:08 +11:00
svlandeg	248339039e	fix type in docs	2021-02-26 14:27:10 +01:00
svlandeg	08fd901a1b	kb.get_candidates renamed to get_alias_candidates	2021-02-25 20:09:36 +01:00
Adriane Boyd	6a37f343d5	Extend docs related to Vocab.get_noun_chunks	2021-02-25 16:38:21 +01:00
Ines Montani	24cecbb3f4	Merge pull request #7126 from adrianeboyd/docs/gpu-id-opt [ci skip] Add tip about --gpu-id to training quickstart	2021-02-24 22:34:17 +11:00
Ken	fa7ddc7f88	Update sentencizer documentation example with sentencizer pipe name (#7185 )	2021-02-24 08:06:54 +01:00
Tocic	b1996a51a1	fix typo in models.md (#7157 )	2021-02-22 09:00:38 +01:00
Sofie Van Landeghem	b92f81d5da	fix NEL config and IO, and n_sents functionality (#7100 ) * fix NEL config and IO, and n_sents functionality * add docs * fix test	2021-02-22 14:49:52 +11:00
Sofie Van Landeghem	ba5a50f62b	NEL docs & UX (#7129 ) * EL set_kb docs fix * custom warning for set_kb mistake	2021-02-22 11:04:22 +11:00
Adriane Boyd	7198be0f4b	Add tip about --gpu-id to training quickstart	2021-02-19 14:07:51 +01:00
Sofie Van Landeghem	709c9e75af	span.ent only returns first sentence (#7084 ) * return first sentence when span contains sentence boundary * docs fix * small fixes * cleanup	2021-02-19 23:02:38 +11:00
palandlom	9b82586699	var batch is useless (#7111 ) It seems that nlp.update(examples) should be nlp.update(batch)	2021-02-18 09:44:22 +01:00
Ines Montani	fc4fb6eb3a	Make v2.x docs more prominent [ci skip]	2021-02-17 23:42:27 +11:00
Ines Montani	6b9026a219	Merge pull request #7000 from explosion/feature/project-yml-overrides Support env vars and CLI overrides for project.yml	2021-02-11 12:31:45 +11:00
Peter Baumann	61b04a70d5	Run PhraseMatcher on Spans (#6918 ) * Add regression test * Run PhraseMatcher on Spans * Add test for PhraseMatcher on Spans and Docs * Add SCA * Add test with 3 matches in Doc, 1 match in Span * Update docs * Use doc.length for find_matches in tokenizer Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>	2021-02-10 23:43:32 +11:00
Ines Montani	c08b3f294c	Support env vars and CLI overrides for project.yml	2021-02-10 13:45:27 +11:00
svlandeg	9a7f33c916	final 3.0 benchmark numbers	2021-02-09 21:28:33 +01:00
Ines Montani	ca3f8386d7	Merge pull request #6975 from svlandeg/fix/link [ci skip] fix link	2021-02-09 14:34:11 +11:00
tarskiandhutch	e897e7aaad	Line 70: syntax error Original config definition treated dictionary key as a function argument.	2021-02-08 15:24:57 -05:00
svlandeg	bb7482bef8	fix link	2021-02-08 18:39:59 +01:00
Sofie Van Landeghem	6ed423c16c	reduce memory load when reading all vectors from file (#6945 ) * reduce memory load when reading all vectors from file * one more small typo fix	2021-02-07 08:05:43 +08:00
Ines Montani	433835d9b0	Merge pull request #6889 from adrianeboyd/docs/source-install-dup [ci skip]	2021-02-05 13:35:16 +11:00
svlandeg	7cda5605a0	add type	2021-02-03 13:13:58 +01:00
svlandeg	94929c2b98	small doc fixes	2021-02-03 13:10:22 +01:00
Ines Montani	2cdfcd2d19	Update naming [ci skip]	2021-02-03 12:48:31 +11:00
Adriane Boyd	37a68a06ab	Update to recommend editable installs for source installs	2021-02-02 16:51:27 +01:00
Adriane Boyd	3a3e4daf60	Update install instructions * Remove duplicate section about compiling from source	2021-02-02 14:44:15 +01:00
Pengcheng YIN	6fdc33203a	Fix a typo	2021-02-01 17:26:28 -05:00
Ines Montani	a59f3fcf5d	Make wheel the default format and update docs [ci skip]	2021-02-01 23:18:43 +11:00
Ines Montani	31b842d6ce	Update table [ci skip]	2021-02-01 14:17:52 +11:00
Ines Montani	7752f80f39	Update docs [ci skip]	2021-01-31 16:11:24 +11:00
Ines Montani	a8a1231ccd	Update README and docs [ci skip]	2021-01-31 12:36:04 +11:00
Ines Montani	45c551037d	Update CLI docs [ci skip]	2021-01-30 21:50:23 +11:00
Ines Montani	ae07416fda	Merge branch 'website/v3-launch' into develop	2021-01-30 20:31:06 +11:00
Ines Montani	2332c4280b	Update and use unified --build option	2021-01-30 13:11:36 +11:00
Ines Montani	2609ba4e89	Support building wheel in spacy package	2021-01-30 11:54:02 +11:00
Ines Montani	95e958a229	Merge pull request #6852 from explosion/feature/replace-listeners	2021-01-30 00:58:08 +11:00
Ines Montani	7694f76dd1	Update warning and mention replace_listeners	2021-01-29 23:46:01 +11:00
Adriane Boyd	8b76cb8095	Rephrase transformers PyTorch instructions	2021-01-29 13:36:56 +01:00
Ines Montani	095055ac48	Merge pull request #6855 from adrianeboyd/docs/trf-sentencepiece [ci skip] Update transfomers install docs	2021-01-29 23:34:01 +11:00
Adriane Boyd	e3e87e7275	Update transfomers install docs * Recommend installing PyTorch separately * Add instructions for `sentencepiece`	2021-01-29 13:27:43 +01:00
Ines Montani	e766e8c56d	Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2021-01-29 21:41:17 +11:00
svlandeg	d7d838281c	adding new="3" mentions in the doc	2021-01-29 11:26:37 +01:00
Ines Montani	99af9e7125	Update documentation	2021-01-29 18:45:48 +11:00
Sofie Van Landeghem	24a697abb8	avoid empty aliases and improve UX and docs (#6840 )	2021-01-29 08:51:40 +08:00
Sofie Van Landeghem	837a4f53c2	Error handling in nlp.pipe (#6817 ) * add error handler for pipe methods * add unit tests * remove pipe method that are the same as their base class * have Language keep track of a default error handler * cleanup * formatting * small refactor * add documentation	2021-01-29 08:51:21 +08:00
Ines Montani	230e651ad6	Merge branch 'develop' into master-tmp	2021-01-27 13:26:29 +11:00
Ines Montani	634ae609b4	Adjust formatting [ci skip]	2021-01-27 13:08:00 +11:00
Ines Montani	5d79d1af50	Merge pull request #6796 from svlandeg/docs/benchmarks [ci skip]	2021-01-27 13:01:23 +11:00
Ines Montani	1ed7029d47	Update website for v3 launch	2021-01-27 12:39:47 +11:00
Adriane Boyd	c447aa2b98	Update --code arg in evaluate CLI docs	2021-01-26 15:30:46 +01:00
jganseman	907bce7a78	Merge pull request #1 from jganseman/patch-1 Patch 1	2021-01-26 11:12:30 +01:00
jganseman	8bc57ec372	also update is_oov in lexeme docs	2021-01-26 11:09:16 +01:00
jganseman	1f2b0ec168	proposing a more concise explanation for is_oov proposing a more concise explanation for is_oov	2021-01-26 10:53:39 +01:00
Matthew Honnibal	f049df1715	Revert "Set annotations in update" (#6810 ) * Revert "Set annotations in update (#6767)" This reverts commit `e680efc7cc`. * Fix version * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/transition_parser.pyx * Update spacy/pipeline/transition_parser.pyx * Update website/docs/api/multilabel_textcategorizer.md * Update website/docs/api/tok2vec.md * Update website/docs/usage/layers-architectures.md * Update website/docs/usage/layers-architectures.md * Update website/docs/api/transformer.md * Update website/docs/api/textcategorizer.md * Update website/docs/api/tagger.md * Update spacy/pipeline/entity_linker.py * Update website/docs/api/sentencerecognizer.md * Update website/docs/api/pipe.md * Update website/docs/api/morphologizer.md * Update website/docs/api/entityrecognizer.md * Update spacy/pipeline/entity_linker.py * Update spacy/pipeline/multitask.pyx * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/tagger.pyx * Update spacy/pipeline/textcat.py * Update spacy/pipeline/textcat.py * Update spacy/pipeline/textcat.py * Update spacy/pipeline/tok2vec.py * Update spacy/pipeline/trainable_pipe.pyx * Update spacy/pipeline/trainable_pipe.pyx * Update spacy/pipeline/transition_parser.pyx * Update spacy/pipeline/transition_parser.pyx * Update website/docs/api/entitylinker.md * Update website/docs/api/dependencyparser.md * Update spacy/pipeline/trainable_pipe.pyx	2021-01-25 22:18:45 +08:00
Adriane Boyd	61c9f8bf24	Remove transformers model max length section (#6807 )	2021-01-25 19:59:34 +08:00
svlandeg	56064faed9	update caption	2021-01-23 00:57:00 +01:00
svlandeg	d7c0f40a96	update comment	2021-01-22 18:55:18 +01:00
svlandeg	a071279bc7	add speed comparison to docs	2021-01-22 18:46:35 +01:00
svlandeg	b132cb3036	update accuracies for new a1 models	2021-01-21 20:24:05 +01:00
Adriane Boyd	d0236136a2	Fix default config init in Transformer API docs (#6781 )	2021-01-21 23:18:03 +08:00
Sofie Van Landeghem	e680efc7cc	Set annotations in update (#6767 ) * bump to 3.0.0rc4 * do set_annotations in component update calls * update docs and remove set_annotations flag * fix EL test	2021-01-20 11:49:25 +11:00
Sofie Van Landeghem	57640aa838	warn when frozen components break listener pattern (#6766 ) * warn when frozen components break listener pattern * few notes in the documentation * update arg name * formatting * cleanup * specify listeners return type	2021-01-20 11:12:35 +11:00
Ines Montani	4a1029a9b6	Add infobox [ci skip]	2021-01-19 19:18:39 +11:00
Ines Montani	f50502dad7	Update docs [ci skip]	2021-01-19 00:22:47 +11:00
Sofie Van Landeghem	fed8f48965	raise NotImplementedError when noun_chunks iterator is not implemented (#6711 ) * raise NotImplementedError when noun_chunks iterator is not implemented * bring back, fix and document span.noun_chunks * formatting Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2021-01-17 19:56:05 +08:00
Adriane Boyd	bf0cdae8d4	Add token_splitter component (#6726 ) * Add long_token_splitter component Add a `long_token_splitter` component for use with transformer pipelines. This component splits up long tokens like URLs into smaller tokens. This is particularly relevant for pretrained pipelines with `strided_spans`, since the user can't change the length of the span `window` and may not wish to preprocess the input texts. The `long_token_splitter` splits tokens that are at least `long_token_length` tokens long into smaller tokens of `split_length` size. Notes: * Since this is intended for use as the first component in a pipeline, the token splitter does not try to preserve any token annotation. * API docs to come when the API is stable. * Adjust API, add test * Fix name in factory	2021-01-17 19:54:41 +08:00
Adriane Boyd	9328dd5625	Handle unset token.morph in Morphologizer (#6704 ) * Handle unset token.morph in Morphologizer Handle unset `token.morph` in `Morphologizer.initialize` and `Morphologizer.get_loss`. If both `token.morph` and `token.pos` are unset, treat the annotation as missing rather than empty. * Add token.has_morph()	2021-01-15 17:20:10 +01:00
Adriane Boyd	0c936004d1	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3	2021-01-14 11:49:58 +01:00
Matthew Honnibal	f277bfdf0f	Add SpanGroup and Graph container types to represent arbitrary annotations (#6696 ) * Draft out initial Spans data structure * Initial span group commit * Basic span group support on Doc * Basic test for span group * Compile span_group.pyx * Draft addition of SpanGroup to DocBin * Add deserialization for SpanGroup * Add tests for serializing SpanGroup * Fix serialization of SpanGroup * Add EdgeC and GraphC structs * Add draft Graph data structure * Compile graph * More work on Graph * Update GraphC * Upd graph * Fix walk functions * Let Graph take nodes and edges on construction * Fix walking and getting * Add graph tests * Fix import * Add module with the SpanGroups dict thingy * Update test * Rename 'span_groups' attribute * Try to fix c++11 compilation * Fix test * Update DocBin * Try to fix compilation * Try to fix graph * Improve SpanGroup docstrings * Add doc.spans to documentation * Fix serialization * Tidy up and add docs * Update docs [ci skip] * Add SpanGroup.has_overlap * WIP updated Graph API * Start testing new Graph API * Update Graph tests * Update Graph * Add docstring Co-authored-by: Ines Montani <ines@ines.io>	2021-01-14 17:30:41 +11:00
Adriane Boyd	a45d89f09a	Add initialize.before_init and after_init callbacks Add `initialize.before_init` and `initialize.after_init` callbacks to the config. The `initialize.before_init` callback is a place to implement one-time tokenizer customizations that are then saved with the model.	2021-01-12 13:07:44 +01:00
Sofie Van Landeghem	a612a5ba3f	fix small typos (#6698 )	2021-01-08 09:39:47 +01:00
Sofie Van Landeghem	75d9019343	Fix types of Tok2Vec encoding architectures (#6442 ) * fix TorchBiLSTMEncoder documentation * ensure the types of the encoding Tok2vec layers are correct * update references from v1 to v2 for the new architectures	2021-01-07 16:39:27 +11:00
Sofie Van Landeghem	82ae95267a	Docs for pretrain architectures (#6605 ) * document pretraining architectures * formatting * bit more info * small fixes	2021-01-06 16:12:30 +11:00
Sofie Van Landeghem	afc5714d32	multi-label textcat component (#6474 ) * multi-label textcat component * formatting * fix comment * cleanup * fix from #6481 * random edit to push the tests * add explicit error when textcat is called with multi-label gold data * fix error nr * small fix	2021-01-06 13:07:14 +11:00
Ines Montani	6f83abb971	Merge pull request #6647 from svlandeg/feature/init_config_overwrite	2021-01-05 14:59:04 +11:00
Ines Montani	3614472e29	Merge pull request #6646 from svlandeg/feature/cli-docs [ci skip]	2021-01-05 13:52:49 +11:00
Ines Montani	9c078a5885	Update formatting for consistency [ci skip]	2021-01-05 13:52:28 +11:00
Ines Montani	a9e845426f	Use --force for consistency and add docs	2021-01-05 13:49:59 +11:00
svlandeg	d5ff0fecf8	add docs	2020-12-30 14:01:13 +01:00
svlandeg	2fa23b0304	fix capitalization for link	2020-12-29 15:01:22 +01:00
svlandeg	43cc6aea93	remove non-existing link	2020-12-29 14:59:39 +01:00
svlandeg	543073bf9d	add pretrain example	2020-12-29 14:51:23 +01:00
svlandeg	1d0ef98873	move example	2020-12-29 14:46:03 +01:00
svlandeg	20113b8063	add train CLI example	2020-12-29 14:44:56 +01:00
Sofie Van Landeghem	87562e470d	fix backticks in docs (#6635 )	2020-12-27 22:12:37 +01:00
Sofie Van Landeghem	8df5b7f513	fix documentation of 'path' in tokenizer.to_disk (#6634 )	2020-12-27 22:01:06 +01:00
Sofie Van Landeghem	282a3b49ea	Fix parser resizing when there is no upper layer (#6460 ) * allow resizing of the parser model even when upper=False * update from spacy.TransitionBasedParser.v1 to v2 * bugfix	2020-12-18 18:56:57 +08:00
Gareth Sparks	efc229c3f4	Doc.char_span arg: alignment_mode (#6591 ) Currently labeled "mode", actually "alignment_mode"	2020-12-18 09:54:56 +01:00
Ines Montani	85ca8c2bdd	Merge branch 'master' into develop	2020-12-11 13:44:41 +11:00
Ines Montani	fb43a30a71	Merge pull request #6545 from svlandeg/feature/discussions [ci skip]	2020-12-11 10:20:35 +11:00
svlandeg	5afa567767	replace gitter with discussions in 101	2020-12-10 20:17:36 +01:00
Adriane Boyd	27bb75e2a0	Docs and extras updates for v2.3.5 * Update install instructions for updated packages * Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only compatible with `thinc>=7.4.4`)	2020-12-10 15:34:34 +01:00
Ines Montani	513c4e332a	Include custom code via spacy package command (#6531 )	2020-12-10 20:36:46 +08:00
Ines Montani	2a6043fabb	Merge pull request #6530 from explosion/feature/init-config-cpu-gpu	2020-12-10 09:38:46 +11:00
Ines Montani	9d32e839d3	Merge branch 'develop' into feature/init-config-cpu-gpu	2020-12-10 08:50:53 +11:00
Adriane Boyd	972820e2b3	Add batch_size to data formats docs	2020-12-09 12:44:04 +01:00
Adriane Boyd	80ac8af1bf	Format	2020-12-09 12:44:01 +01:00
Adriane Boyd	795b5bd049	Update website/docs/api/language.md Co-authored-by: Ines Montani <ines@ines.io>	2020-12-09 12:23:32 +01:00
Adriane Boyd	fa8fa474a3	Add nlp.batch_size setting Add a default `batch_size` setting for `Language.pipe` and `Language.evaluate` as `nlp.batch_size`.	2020-12-09 09:13:26 +01:00
Ines Montani	34449b66fd	Update matcher.md	2020-12-09 11:09:45 +11:00
Ines Montani	1980203229	Merge branch 'master' into pr/6444	2020-12-09 11:09:40 +11:00
Ines Montani	05a2812ae0	Merge branch 'develop' into pr/6444	2020-12-09 11:04:03 +11:00
Ines Montani	758ad6c3cd	Make CPU the default for init config	2020-12-09 11:00:51 +11:00
Ines Montani	8921364579	Merge pull request #6521 from explosion/feature/config-stdin Allow reading config from stdin in spacy train	2020-12-08 22:07:43 +11:00
Ines Montani	94a5a9814f	Update argument handling and documentation	2020-12-08 20:41:18 +11:00
Adriane Boyd	5ceac425ee	Remove non-working --use-chars from train CLI Remove the non-working `--use-chars` option from the train CLI. The implementation of the option across component types and the CLI settings could be fixed, but the `CharacterEmbed` model does not work on GPU in v2 so it's better to remove it.	2020-12-08 08:30:00 +01:00
Ines Montani	ef59ce783b	Adjust install instructions [ci skip]	2020-12-08 18:06:50 +11:00
Sofie Van Landeghem	2c27093c5f	require_cpu functionality (#6336 ) * add require_cpu from Thinc 8.0.0rc2 * add docs * fix test if cupy is not installed	2020-12-08 14:42:40 +08:00
Ines Montani	d8e01ca931	Merge pull request #6391 from adrianeboyd/docs/install-guide	2020-12-08 07:42:16 +01:00
Ines Montani	ee2ec52f48	Merge pull request #6409 from svlandeg/feature/trf-docs	2020-12-08 06:32:10 +01:00
Ines Montani	c2b196c2c1	Merge pull request #6419 from svlandeg/feature/rel-docs	2020-12-08 06:30:41 +01:00
Ines Montani	82e88f0e3b	Merge pull request #6379 from svlandeg/fix/labels-constructor	2020-12-08 06:29:56 +01:00
Adriane Boyd	1442d2f213	Improve simple training example in v3 migration (#6438 ) * Create the examples once * Use the examples in the initialization * Provide the batch size * Fix `begin_training` migration example	2020-11-30 09:39:45 +08:00
Adriane Boyd	03ae77e603	Add SPACY as a Matcher attribute (#6463 )	2020-11-30 09:34:50 +08:00
Adriane Boyd	724831b066	Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master * Update Macedonian for v3 * Update Turkish for v3	2020-11-25 11:49:34 +01:00
Jacob Bortell	fe9009911a	Update rule-based-matching.md (#6421 ) * Update rule-based-matching.md Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc). Clarified "Type" column header to "Value Type" * Update rule-based-matching.md Improved clarity of wording	2020-11-24 16:20:19 +01:00
Adriane Boyd	6f133877aa	Update source install instructions * Don't recommend an editable install in the default source instructions. * Use `pip install --no-build-isolation` for editable installs. * Remove reference to `virtualenv`.	2020-11-24 14:44:13 +01:00
svlandeg	218abaa69a	typo	2020-11-20 22:36:49 +01:00
svlandeg	e861e928df	more small corrections	2020-11-20 22:29:58 +01:00
svlandeg	5ac0867427	final fixes	2020-11-20 22:18:53 +01:00
svlandeg	331ec83493	edits and updates to implementing REL component docs	2020-11-20 21:41:52 +01:00
svlandeg	4a3e611abc	small fixes and formatting	2020-11-20 15:55:05 +01:00
svlandeg	124f49feb6	update REL model code	2020-11-20 15:25:20 +01:00
svlandeg	636be3c791	Merge remote-tracking branch 'upstream/develop' into feature/trf-docs	2020-11-19 14:15:35 +01:00
Sofie Van Landeghem	165993d8e5	fix typo in transformer docs (#6404 )	2020-11-19 14:11:38 +01:00
Adriane Boyd	96726ec1f6	Fix DocBin init in training example (#6396 )	2020-11-17 14:36:44 +01:00
Adriane Boyd	ed32fa80cd	Update source install instructions * Use `pip install` instead of `python setup.py install` * For developers recommend: * `python setup.py build_ext --inplace -j N` * `python setup.py develop`	2020-11-16 10:13:51 +01:00
svlandeg	99d0412b6e	add link to REL project	2020-11-15 18:35:56 +01:00
svlandeg	73fc1ed963	remove labels from morphologizer constructor	2020-11-11 21:48:50 +01:00
svlandeg	fcd79e0655	remove set_morphology from docs	2020-11-11 21:32:34 +01:00
Ines Montani	de6453940e	Merge pull request #6305 from svlandeg/feature/score-docs [ci skip]	2020-11-10 02:52:11 +01:00
Ines Montani	d7950c5ada	Merge pull request #6297 from adrianeboyd/docs/nightly-conda-install [ci skip]	2020-11-10 02:45:52 +01:00
svlandeg	789fb3d124	add docs for upstream argument of TransformerListener	2020-11-09 21:42:58 +01:00
Ines Montani	363ac73c72	Update docs [ci skip]	2020-11-09 12:43:26 +08:00
Adriane Boyd	8644ee3e3f	Update TIGER link and tag description (#6344 )	2020-11-05 09:33:00 +01:00
Sofie Van Landeghem	8ef056cf98	fix embed_size in Entity Linker architecture (#6343 )	2020-11-04 22:20:13 +01:00
Ines Montani	019a1dd5e8	Fix v3 overview [ci skip]	2020-11-03 18:10:06 +01:00
Adriane Boyd	a4b32b9552	Handle missing reference values in scorer (#6286 ) * Handle missing reference values in scorer Handle missing values in reference doc during scoring where it is possible to detect an unset state for the attribute. If no reference docs contain annotation, `None` is returned instead of a score. `spacy evaluate` displays `-` for missing scores and the missing scores are saved as `None`/`null` in the metrics. Attributes without unset states: * `token.head`: relies on `token.dep` to recognize unset values * `doc.cats`: unable to handle missing annotation Additional changes: * add optional `has_annotation` check to `score_scans` to replace `doc.sents` hack * update `score_token_attr_per_feat` to handle missing and empty morph representations * fix bug in `Doc.has_annotation` for normalization of `IS_SENT_START` vs. `SENT_START` * Fix import * Update return types	2020-11-03 15:47:18 +01:00
Adriane Boyd	dc816bba9d	Fix node name typo in dependency matcher example (#6311 )	2020-10-28 16:32:46 +01:00
svlandeg	77688b0072	fix config	2020-10-26 11:14:34 +01:00
svlandeg	5878ff6bcd	cleanup	2020-10-26 11:13:02 +01:00
svlandeg	e95d9caa87	small edits	2020-10-26 11:09:25 +01:00
svlandeg	a664994a81	adding score method to explanation of new component	2020-10-26 10:52:47 +01:00
Adriane Boyd	c0b76f4c19	Add install step to "Compile from source"	2020-10-23 11:36:36 +02:00
Ines Montani	b6b1c1e23c	Merge pull request #6271 from walterhenry/develop-proof [ci skip]	2020-10-19 16:31:43 +02:00
walterhenry	db24dc5614	Proofread remarks I think these may the last remarks for the nightly docs. Only two minor things actually.	2020-10-19 11:11:32 +02:00
Sofie Van Landeghem	75a202ce65	TextCat updates and fixes (#6263 ) * small fix in example imports * throw error when train_corpus or dev_corpus is not a string * small fix in custom logger example * limit macro_auc to labels with 2 annotations * fix typo * also create parents of output_dir if need be * update documentation of textcat scores * refactor TextCatEnsemble * fix tests for new AUC definition * bump to 3.0.0a42 * update docs * rename to spacy.TextCatEnsemble.v2 * spacy.TextCatEnsemble.v1 in legacy * cleanup * small fix * update to 3.0.0rc2 * fix import that got lost in merge * cursed IDE * fix two typos	2020-10-18 14:50:41 +02:00
Ines Montani	c655742b8b	Remove docs references to starters for now (see #6262 ) [ci skip]	2020-10-16 15:46:34 +02:00
Ines Montani	3851300e80	Update landing [ci skip]	2020-10-16 11:46:33 +02:00
Ines Montani	c968d1560f	Fix docs example [ci skip]	2020-10-16 11:33:20 +02:00
Ines Montani	ba1e004049	Fix typo [ci skip]	2020-10-15 23:39:04 +02:00
Ines Montani	20f80587d6	Merge pull request #6257 from walterhenry/develop-proof A few tiny typo fixes to push through with release of nightly	2020-10-15 18:17:30 +02:00
walterhenry	75b7f86383	Three small typos Some little typos since v3.0 is out.	2020-10-15 18:06:37 +02:00
Ines Montani	09dbbe75d7	Update docs [ci skip]	2020-10-15 17:27:24 +02:00
Ines Montani	7f05ccc170	Update docs [ci skip]	2020-10-15 12:35:30 +02:00
Ines Montani	4fa869e6f7	Update docs [ci skip]	2020-10-15 11:16:06 +02:00
Ines Montani	178760855f	Merge branch 'develop' into master-tmp	2020-10-15 09:06:03 +02:00
Ines Montani	abeafcbc08	Update docs [ci skip]	2020-10-15 08:58:30 +02:00
Ines Montani	a966c271f7	Update models docs [ci skip]	2020-10-14 20:50:23 +02:00
Ines Montani	a2d4aaee70	Apply suggestions from code review	2020-10-14 19:51:36 +02:00
Ines Montani	d94e241fce	Merge branch 'develop' into pr/6253	2020-10-14 16:55:46 +02:00
Ines Montani	cb47f25cda	Merge pull request #6252 from svlandeg/fix/docs	2020-10-14 16:43:12 +02:00
walterhenry	6af585dba5	New batch of proofs Just tiny fixes to the docs as a proofreader	2020-10-14 16:37:57 +02:00
svlandeg	478a14a619	fix few typos	2020-10-14 15:01:19 +02:00
Ines Montani	1aa8e8f2af	Update docs [ci skip]	2020-10-14 14:58:45 +02:00
Ines Montani	4d99d2b94a	Update docs [ci skip]	2020-10-13 11:38:52 +02:00
svlandeg	40276fd3be	update NEL docs after latest refactor	2020-10-12 11:41:27 +02:00
svlandeg	08cb085f6c	Merge remote-tracking branch 'upstream/develop' into fix/various	2020-10-09 17:01:27 +02:00
Ines Montani	97ff090e49	Fix docs example [ci skip]	2020-10-09 16:03:57 +02:00
Ines Montani	9fb3244672	Merge pull request #6231 from adrianeboyd/feature/include-static-vectors	2020-10-09 15:54:52 +02:00
Adriane Boyd	2dd79454af	Update docs	2020-10-09 14:42:07 +02:00
svlandeg	853edace37	fix MultiHashEmbed example in documentation	2020-10-09 14:11:06 +02:00
Ines Montani	e50dc2c1c9	Update docs [ci skip]	2020-10-09 12:04:52 +02:00
Ines Montani	7c52def5da	Merge pull request #6227 from adrianeboyd/chore/update-3.0.0a36-from-master	2020-10-09 10:49:20 +02:00
Ines Montani	329b61ee7b	Update docs [ci skip]	2020-10-09 10:36:06 +02:00
delzac	668507be1b	Reflect on usage doc that IS_SENT_START attribute exist (#6114 ) * Reflect on usage doc that IS_SENT_START attribute exist * Create delzac.md	2020-10-09 10:14:40 +02:00
Sofie Van Landeghem	d093d6343b	TrainablePipe (#6213 ) * rename Pipe to TrainablePipe * split functionality between Pipe and TrainablePipe * remove unnecessary methods from certain components * cleanup * hasattr(component, "pipe") should be sufficient again * remove serialization and vocab/cfg from Pipe * unify _ensure_examples and validate_examples * small fixes * hasattr checks for self.cfg and self.vocab * make is_resizable and is_trainable properties * serialize strings.json instead of vocab * fix KB IO + tests * fix typos * more typos * _added_strings as a set * few more tests specifically for _added_strings field * bump to 3.0.0a36	2020-10-08 21:33:49 +02:00
Ines Montani	5ebd1fc2cf	Update docs [ci skip]	2020-10-08 16:23:12 +02:00
Ines Montani	d1602e1ece	Update docs [ci skip]	2020-10-08 11:56:50 +02:00
Ines Montani	064575d79d	Merge pull request #6216 from svlandeg/feature/nel-initialize	2020-10-08 11:14:12 +02:00
Ines Montani	43e59bb22a	Update docs and install extras [ci skip]	2020-10-08 10:58:50 +02:00
svlandeg	eaf5c265cb	set_kb method for entity_linker	2020-10-08 10:34:01 +02:00
svlandeg	bcaad28eda	fix typos	2020-10-07 13:05:37 +02:00
delzac	15ea401b39	Reflect on usage doc that IS_SENT_START attribute exist (#6114 ) * Reflect on usage doc that IS_SENT_START attribute exist * Create delzac.md	2020-10-06 15:11:01 +02:00
Ines Montani	ce14520789	Update docs [ci skip]	2020-10-06 14:35:17 +02:00
Ines Montani	2a17566da3	Update docs [ci skip]	2020-10-06 14:15:08 +02:00
Ines Montani	967377287a	Merge pull request #6210 from adrianeboyd/docs/various-v3-3 [ci skip]	2020-10-06 11:28:45 +02:00
Adriane Boyd	aa9c9f3bf0	Update Chinese usage for spacy-pkuseg	2020-10-06 11:21:17 +02:00
Ines Montani	2fd7122074	Update docs [ci skip]	2020-10-06 10:31:48 +02:00
Ines Montani	568e12215d	Merge pull request #6206 from svlandeg/fix/patterns-init	2020-10-06 10:27:23 +02:00
Ines Montani	2e961817cb	Update docs [ci skip]	2020-10-06 10:23:01 +02:00
svlandeg	9b4cf7b0b6	update output of debug config command	2020-10-06 09:47:23 +02:00
svlandeg	fd0f60e2bc	updates to data format for training and pretraining	2020-10-06 09:28:53 +02:00
svlandeg	ff9ac39c88	read entity_ruler patterns with srsly.read_jsonl.v1	2020-10-05 22:50:14 +02:00
Ines Montani	1a554bdcb1	Update docs and docstring [ci skip]	2020-10-05 21:55:27 +02:00
Ines Montani	181039bd17	Merge pull request #6205 from explosion/feature/embed-features	2020-10-05 21:49:10 +02:00
Ines Montani	706b7f6973	Update docs	2020-10-05 20:51:22 +02:00
Matthew Honnibal	919790cb47	Upd MultiHashEmbed docs	2020-10-05 20:28:21 +02:00
svlandeg	193e0d5a98	add docs for entity_ruler.initialize	2020-10-05 18:04:08 +02:00
svlandeg	65abd77779	add finish_update to Pipe	2020-10-05 16:23:33 +02:00
Ines Montani	e3acad6264	Update docs [ci skip]	2020-10-05 13:06:20 +02:00
Ines Montani	0f64556c04	Merge pull request #6197 from svlandeg/feature/pipe-docs [ci skip]	2020-10-05 11:55:40 +02:00
svlandeg	9a6c9b133b	various small fixes	2020-10-05 01:05:37 +02:00
svlandeg	52b660e9dc	initialize and update explanation	2020-10-05 00:39:36 +02:00
Ines Montani	3c36a57e84	Update data augmenters (#6196 ) * Draft lower-case augmenter * Make warning a debug log * Update lowercase augmenter, docs and tests Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-10-04 17:46:29 +02:00
svlandeg	b0463fbf75	set_annotations explanation	2020-10-04 14:56:48 +02:00
Ines Montani	43d7652635	Merge pull request #6192 from explosion/feature/init-attr-ruler	2020-10-04 14:46:37 +02:00
Ines Montani	9b3a934361	Update docs [ci skip]	2020-10-04 14:14:55 +02:00
svlandeg	9f40d963fd	highlight the two steps: the model and the pipeline component	2020-10-04 14:11:53 +02:00
Ines Montani	11347f34da	Tidy up, tests and docs	2020-10-04 13:54:05 +02:00
svlandeg	452b8309f9	slight rewrite to hide some thinc implementation details	2020-10-04 13:26:46 +02:00
svlandeg	08ad349a18	tok2vec layer	2020-10-04 00:08:02 +02:00
svlandeg	2c4b2ee5e9	REL intro and get_candidates function	2020-10-03 23:27:05 +02:00
Ines Montani	989c59918c	Update docs [ci skip]	2020-10-03 18:53:39 +02:00
Ines Montani	7c4ab7e82c	Fix Lemmatizer.get_lookups_config	2020-10-03 17:16:10 +02:00
Ines Montani	dd542ec6a4	Fix label initialization of textcat component (#6190 )	2020-10-03 17:07:38 +02:00
Ines Montani	3b8f352eda	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-10-03 16:08:27 +02:00
Ines Montani	35d695a031	Update docs	2020-10-03 16:08:24 +02:00
Matthew Honnibal	db419f6b2f	Improve control of training progress and logging (#6184 ) * Make logging and progress easier to control * Update docs * Cleanup errors * Fix ConfigValidationError * Pass stdout/stderr, not wasabi.Printer * Fix type * Upd logging example * Fix logger example * Fix type	2020-10-03 14:57:46 +02:00
Ines Montani	5fb776556a	Update docs [ci skip]	2020-10-03 14:47:02 +02:00
Ines Montani	5413358ba1	Merge pull request #6188 from svlandeg/feature/small-fixes	2020-10-03 11:44:24 +02:00
Ines Montani	eb9b3ff9c5	Update install docs and quickstarts [ci skip]	2020-10-03 11:35:42 +02:00
svlandeg	02247cccaf	Merge remote-tracking branch 'upstream/develop' into feature/small-fixes	2020-10-02 20:48:11 +02:00
Sofie Van Landeghem	09dcb75076	small UX fix for DocBin (#6167 ) * add informative warning when messing up store_user_data DocBin flags * add informative warning when messing up store_user_data DocBin flags * cleanup test * rename to patterns_path	2020-10-02 15:43:32 +02:00
Ines Montani	f0b30aedad	Make lemmatizers use initialize logic (#6182 ) * Make lemmatizer use initialize logic and tidy up * Fix typo * Raise for uninitialized tables	2020-10-02 15:42:36 +02:00
Ines Montani	df06f7a792	Update docs [ci skip]	2020-10-02 13:24:33 +02:00
Ines Montani	d2aa662ab2	Merge pull request #6179 from adrianeboyd/feature/token-morph-refactor-2 [ci skip]	2020-10-02 12:10:27 +02:00
Ines Montani	0f11c2150d	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-10-02 11:38:05 +02:00
Ines Montani	32cdc1c4f4	Update docs [ci skip]	2020-10-02 11:38:03 +02:00
Ines Montani	6d8df081bd	Merge pull request #6180 from adrianeboyd/docs/minor-v3-2 [ci skip]	2020-10-02 11:37:25 +02:00
Adriane Boyd	351f352cdc	Update Japanese docs and pin for sudachipy	2020-10-02 10:12:44 +02:00
Adriane Boyd	7670df04dd	Update Chinese usage docs	2020-10-02 10:09:03 +02:00
Adriane Boyd	3908fff899	Remove tag map sidebar	2020-10-02 09:07:55 +02:00
Adriane Boyd	fd09e6b140	Update docs for Token.morph / Token.set_morph	2020-10-02 09:05:15 +02:00
Ines Montani	01c1538c72	Integrate file readers	2020-10-02 01:36:06 +02:00
Ines Montani	6b94cee468	Fix docs [ci skip]	2020-10-02 01:11:19 +02:00
Ines Montani	b6b73a3ca8	Update docs [ci skip]	2020-10-01 17:45:29 +02:00
Ines Montani	f2627157c8	Update docs [ci skip]	2020-10-01 17:38:17 +02:00
svlandeg	1328c9fd14	consistently use --code instead of --code-path	2020-10-01 16:59:22 +02:00
Sofie Van Landeghem	a22215f427	Add FeatureExtractor from Thinc (#6170 ) * move featureextractor from Thinc * Update website/docs/api/architectures.md Co-authored-by: Ines Montani <ines@ines.io> * Update website/docs/api/architectures.md Co-authored-by: Ines Montani <ines@ines.io> Co-authored-by: Ines Montani <ines@ines.io>	2020-10-01 16:22:48 +02:00
Ines Montani	0a8a124a6e	Update docs [ci skip]	2020-10-01 12:15:53 +02:00
Ines Montani	a103ab5f1a	Update augmenter lookups and docs	2020-09-30 23:03:47 +02:00
Ines Montani	115481aca7	Update docs [ci skip]	2020-09-30 15:16:00 +02:00
walterhenry	1c65b3b2c0	Proofreading A few more small things in Usage.	2020-09-30 11:33:40 +02:00
Ines Montani	469f0e539c	Fix docs [ci skip]	2020-09-30 10:24:06 +02:00
Ines Montani	9bb958fd0a	Fix debug data [ci skip]	2020-09-29 23:07:11 +02:00
Ines Montani	604be54a5c	Support --code in evaluate CLI [ci skip]	2020-09-29 21:20:56 +02:00
Ines Montani	d3c63b7965	Merge branch 'develop' into feature/prepare	2020-09-29 20:53:05 +02:00
Ines Montani	361f91e286	Merge pull request #6135 from walterhenry/develop-proof	2020-09-29 20:49:06 +02:00
Ines Montani	b486389eec	Update website/docs/api/doc.md	2020-09-29 20:48:43 +02:00
Ines Montani	d7469283c5	Update docs [ci skip]	2020-09-29 16:59:21 +02:00
Sofie Van Landeghem	6a04e5adea	encoding UTF8 (#6161 )	2020-09-29 14:49:55 +02:00
walterhenry	1d80b3dc1b	Proofreading Finished with the API docs and started on the Usage, but Embedding & Transformers	2020-09-29 12:39:10 +02:00
walterhenry	c1c841940c	Merge branch 'develop-proof' of https://github.com/walterhenry/spaCy into develop-proof	2020-09-29 11:47:43 +02:00
svlandeg	64d90039a1	encoding UTF8	2020-09-29 10:54:42 +02:00
Ines Montani	ff9a63bfbd	begin_training -> initialize	2020-09-28 21:35:09 +02:00
walterhenry	3360825e00	Proofreading Another round of proofreading. All the API docs have been read through and I've grazed the Usage docs.	2020-09-28 16:50:15 +02:00
Matthew Honnibal	a976da168c	Support data augmentation in Corpus (#6155 ) * Support data augmentation in Corpus * Note initial docs for data augmentation * Add augmenter to quickstart * Fix flake8 * Format * Fix test * Update spacy/tests/training/test_training.py * Improve data augmentation arguments * Update templates * Move randomization out into caller * Refactor * Update spacy/training/augment.py * Update spacy/tests/training/test_training.py * Fix augment * Fix test	2020-09-28 03:03:27 +02:00
Ines Montani	f29d5b9b89	Update docs [ci skip]	2020-09-27 18:39:38 +02:00
Ines Montani	e06ff8b71d	Update docs [ci skip]	2020-09-26 13:18:08 +02:00
Sofie Van Landeghem	009ba14aaf	Fix pretraining in train script (#6143 ) * update pretraining API in train CLI * bump thinc to 8.0.0a35 * bump to 3.0.0a26 * doc fixes * small doc fix	2020-09-25 15:47:10 +02:00
Ines Montani	2aa4d65734	Update docs [ci skip]	2020-09-24 20:41:09 +02:00
Adriane Boyd	3c062b3911	Add MORPH handling to Matcher (#6107 ) * Add MORPH handling to Matcher * Add `MORPH` to `Matcher` schema * Rename `_SetMemberPredicate` to `_SetPredicate` * Add `ISSUBSET` and `ISSUPERSET` operators to `_SetPredicate` * Add special handling for normalization and conversion of morph values into sets * For other attrs, `ISSUBSET` acts like `IN` and `ISSUPERSET` only matches for 0 or 1 values * Update test * Rename to IS_SUBSET and IS_SUPERSET	2020-09-24 16:55:09 +02:00
Sofie Van Landeghem	c7eedd3534	updates to NEL functionality (#6132 ) * NEL: read sentences and ents from reference * fiddling with sent_start annotations * add KB serialization test * KB write additional file with strings.json * score_links function to calculate NEL P/R/F * formatting * documentation	2020-09-24 16:53:59 +02:00
Ines Montani	58dde293ce	Merge pull request #6089 from adrianeboyd/feature/doc-ents-v3-2	2020-09-24 14:44:42 +02:00
Ines Montani	74e1f192b4	Merge pull request #6134 from explosion/feature/training_before_to_disk	2020-09-24 14:44:11 +02:00
Ines Montani	3b58a8be2b	Update docs	2020-09-24 14:32:42 +02:00
Ines Montani	88e54caa12	accuracy -> performance	2020-09-24 14:32:35 +02:00
Ines Montani	b92c8aae78	Merge branch 'develop' into pr/6135	2020-09-24 13:44:56 +02:00
Ines Montani	6836b66433	Update docs and resolve todos [ci skip]	2020-09-24 13:41:25 +02:00
walterhenry	3dd5f409ec	Proofreading Proofread some API docs	2020-09-24 13:15:28 +02:00
Adriane Boyd	1c63f02f99	Add API docs	2020-09-24 12:51:16 +02:00
Ines Montani	138c8d45db	Update docs	2020-09-24 12:43:39 +02:00
Ines Montani	d7ab6a2ffe	Update docs [ci skip]	2020-09-24 12:37:21 +02:00
Ines Montani	ae51f580c1	Fix handling of score_weights	2020-09-24 10:27:33 +02:00
Ines Montani	e2ffe51fb5	Update docs [ci skip]	2020-09-24 10:13:41 +02:00
Ines Montani	02008e9a55	Update docs [ci skip]	2020-09-23 22:02:31 +02:00
Ines Montani	c8bda92243	Update benchmarks [ci skip]	2020-09-23 20:05:02 +02:00
svlandeg	35dbc63578	Merge remote-tracking branch 'upstream/develop' into fix/nr_features # Conflicts: # spacy/ml/models/parser.py # spacy/tests/serialize/test_serialize_config.py # website/docs/api/architectures.md	2020-09-23 17:01:13 +02:00
svlandeg	dd2292793f	'parser' instead of 'deps' for state_type	2020-09-23 16:53:49 +02:00
Ines Montani	50a4425cda	Adjust docs	2020-09-23 16:03:32 +02:00
Ines Montani	e4e7f5b00d	Update docs [ci skip]	2020-09-23 15:44:40 +02:00
svlandeg	6c85fab316	state_type and extra_state_tokens instead of nr_feature_tokens	2020-09-23 13:35:09 +02:00
Ines Montani	6ca06cb62c	Update docs and formatting [ci skip]	2020-09-23 10:14:27 +02:00
Ines Montani	60a317520a	Merge pull request #6109 from svlandeg/feature/2rename	2020-09-23 09:47:12 +02:00
Ines Montani	930b116f00	Update docs [ci skip]	2020-09-23 09:35:21 +02:00
svlandeg	b556a10808	rename converts in_to_out	2020-09-22 11:50:19 +02:00
Ines Montani	f9af7d365c	Update docs [ci skip]	2020-09-22 09:45:41 +02:00
Ines Montani	49e80dbcac	Merge pull request #6103 from explosion/chore/tidy-up-tests-docs-get-doc	2020-09-22 09:45:04 +02:00
Adriane Boyd	844db6ff12	Update architecture overview	2020-09-22 09:31:47 +02:00
Adriane Boyd	5fbb8dfcbc	Merge remote-tracking branch 'upstream/develop' into docs/various-v3-2	2020-09-22 09:22:58 +02:00
Ines Montani	67fbcb3da5	Tidy up tests and docs	2020-09-21 20:43:54 +02:00
Ines Montani	a5f6ab4943	Merge pull request #6098 from adrianeboyd/feature/doc-init	2020-09-21 18:35:20 +02:00
Adriane Boyd	f212303729	Add sent_starts to Doc.__init__ Add sent_starts to `Doc.__init__`. Officially specify `is_sent_start` values but also convert to and accept `sent_start` internally.	2020-09-21 17:59:09 +02:00
Adriane Boyd	6aa91c7ca0	Make user_data keyword-only	2020-09-21 16:00:06 +02:00
Ines Montani	e548654aca	Update docs [ci skip]	2020-09-21 14:46:55 +02:00
Adriane Boyd	bc02e86494	Extend Doc.__init__ with additional annotation Mostly copying from `spacy.tests.util.get_doc`, add additional kwargs to `Doc.__init__` to initialize the most common doc/token values.	2020-09-21 13:36:24 +02:00
Ines Montani	9d32cac736	Update docs [ci skip]	2020-09-21 10:55:36 +02:00
Adriane Boyd	cc71ec901f	Fix typo in saving and loading usage docs	2020-09-21 09:08:55 +02:00
Adriane Boyd	3aa57ce6c9	Update alignment mode in Doc.char_span docs	2020-09-21 09:07:20 +02:00
Ines Montani	012b3a7096	Update docs [ci skip]	2020-09-20 17:44:58 +02:00
Ines Montani	554c9a2497	Update docs [ci skip]	2020-09-20 12:30:53 +02:00
Sofie Van Landeghem	39872de1f6	Introducing the gpu_allocator (#6091 ) * rename 'use_pytorch_for_gpu_memory' to 'gpu_allocator' * --code instead of --code-path * update documentation * avoid querying the "system" section directly * add explanation of gpu_allocator to TF/PyTorch section in docs * fix typo * fix typo 2 * use set_gpu_allocator from thinc 8.0.0a34 * default null instead of empty string	2020-09-19 01:17:02 +02:00
Ines Montani	0406200a1e	Update docs [ci skip]	2020-09-18 15:13:13 +02:00
Ines Montani	a127fa475e	Merge pull request #6078 from svlandeg/fix/corpus	2020-09-18 14:44:21 +02:00
Ines Montani	d32ce121be	Fix docs [ci skip]	2020-09-18 13:41:12 +02:00
Ines Montani	a0b4389a38	Update docs [ci skip]	2020-09-17 19:24:48 +02:00
Matthew Honnibal	6efb7688a6	Draft pretrain usage	2020-09-17 18:17:03 +02:00
Ines Montani	1bb8b4f824	Merge branch 'master' into develop	2020-09-17 17:46:20 +02:00
Ines Montani	6bd0d25fb9	Merge pull request #6085 from explosion/docs/static-vectors-intro [ci skip]	2020-09-17 17:14:45 +02:00
Ines Montani	a2c8cda26f	Update docs [ci skip]	2020-09-17 17:12:51 +02:00
Ines Montani	2e3ce9f42f	Merge branch 'feature/init-config-pretrain' of https://github.com/svlandeg/spaCy into pr/6084	2020-09-17 16:58:49 +02:00
Ines Montani	3d8e010655	Change order	2020-09-17 16:58:46 +02:00
Ines Montani	c4b414b282	Update website/docs/api/cli.md	2020-09-17 16:58:09 +02:00
Sofie Van Landeghem	e5ceec5df0	Update website/docs/api/cli.md Co-authored-by: Ines Montani <ines@ines.io>	2020-09-17 16:56:20 +02:00
Sofie Van Landeghem	127ce0c574	Update website/docs/api/cli.md Co-authored-by: Ines Montani <ines@ines.io>	2020-09-17 16:55:53 +02:00
Matthew Honnibal	ec751068f3	Draft text for static vectors intro	2020-09-17 16:42:53 +02:00
svlandeg	5fade4feb7	fix cli abbrev	2020-09-17 16:15:20 +02:00
svlandeg	ddfc1fc146	add pretraining option to init config	2020-09-17 16:05:40 +02:00
svlandeg	c8c84f1ccd	Merge remote-tracking branch 'upstream/develop' into fix/corpus	2020-09-17 15:43:04 +02:00
svlandeg	130ffa5fbf	fix typos in docs	2020-09-17 14:59:41 +02:00
Ines Montani	c8fa2247e3	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2020-09-17 12:34:15 +02:00
Ines Montani	6761028c6f	Update docs [ci skip]	2020-09-17 12:34:11 +02:00
svlandeg	0c35885751	generalize corpora, dot notation for dev and train corpus	2020-09-17 11:38:59 +02:00
svlandeg	8cedb2f380	Merge branch 'fix/corpus' of https://github.com/svlandeg/spaCy into fix/corpus	2020-09-17 09:27:55 +02:00
svlandeg	781fae678b	Merge remote-tracking branch 'upstream/develop' into fix/corpus	2020-09-17 09:24:36 +02:00
Sofie Van Landeghem	21dcf92964	Update website/docs/api/data-formats.md Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-17 09:21:36 +02:00
Adriane Boyd	7e4cd7575c	Refactor Docs.is_ flags (#6044 ) * Refactor Docs.is_ flags * Add derived `Doc.has_annotation` method * `Doc.has_annotation(attr)` returns `True` for partial annotation * `Doc.has_annotation(attr, require_complete=True)` returns `True` for complete annotation * Add deprecation warnings to `is_tagged`, `is_parsed`, `is_sentenced` and `is_nered` * Add `Doc._get_array_attrs()`, which returns a full list of `Doc` attrs for use with `Doc.to_array`, `Doc.to_bytes` and `Doc.from_docs`. The list is the `DocBin` attributes list plus `SPACY` and `LENGTH`. Notes on `Doc.has_annotation`: * `HEAD` is converted to `DEP` because heads don't have an unset state * Accept `IS_SENT_START` as a synonym of `SENT_START` Additional changes: * Add `NORM`, `ENT_ID` and `SENT_START` to default attributes for `DocBin` * In `Doc.from_array()` the presence of `DEP` causes `HEAD` to override `SENT_START` * In `Doc.from_array()` using `attrs` other than `Doc._get_array_attrs()` (i.e., a user's custom list rather than our default internal list) with both `HEAD` and `SENT_START` shows a warning that `HEAD` will override `SENT_START` * `set_children_from_heads` does not require dependency labels to set sentence boundaries and sets `sent_start` for all non-sentence starts to `-1` * Fix call to set_children_form_heads Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-09-17 00:14:01 +02:00
svlandeg	55f8d5478e	fix example output	2020-09-15 22:09:30 +02:00
svlandeg	51fa929f47	rewrite train_corpus to corpus.train in config	2020-09-15 21:58:04 +02:00
Ines Montani	2214d1bb7b	Merge pull request #6067 from explosion/feature/spacy-blank-from-config	2020-09-15 14:18:33 +02:00
Ines Montani	b7faa38960	Update docs [ci skip]	2020-09-15 12:44:03 +02:00
Ines Montani	0edd695bf6	Update docs	2020-09-15 11:41:49 +02:00
Ines Montani	99549a5ace	Fix consistency and update docs	2020-09-15 11:37:37 +02:00
Ines Montani	154752f9c2	Update docs and consistency [ci skip]	2020-09-15 00:32:49 +02:00
Sofie Van Landeghem	3216a33149	positive_label config for textcat (#6062 ) * hook up positive_label in textcat * unit tests * documentation * formatting * tests * fix typo * move verify_config to after begin_training * revert accidential commit	2020-09-14 17:08:00 +02:00
Ines Montani	b854e0bef9	Update styleguide [ci skip]	2020-09-14 11:25:57 +02:00
Ines Montani	9afb1d9965	Merge pull request #6063 from svlandeg/feature/doc_cleanup [ci skip]	2020-09-14 10:35:43 +02:00
Ines Montani	85e5910102	Update docs [ci skip]	2020-09-13 23:09:19 +02:00
Ines Montani	5ebb2a2ac8	Update docs [ci skip]	2020-09-13 22:36:20 +02:00
Ines Montani	47acb45850	Update docs [ci skip]	2020-09-13 22:30:33 +02:00
Ines Montani	2e3d067a7b	Update docs [ci skip]	2020-09-13 19:29:06 +02:00
Ines Montani	99b26fe492	Update docs [ci skip]	2020-09-13 17:59:38 +02:00
Sofie Van Landeghem	744df9814a	define threshold for scoring textcat in TextCat config (#6055 ) * define threshold for scoring textcat in TextCat config * fix unit test and documentation	2020-09-13 14:15:52 +02:00
Ines Montani	1316071086	Update docs [ci skip]	2020-09-13 11:31:50 +02:00
Ines Montani	368ecf705a	Update docs [ci skip]	2020-09-12 17:40:50 +02:00

... 6 7 8 9 10 ...

1874 Commits