spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-14 03:00:40 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	c0f4a1e43b	train is from-config by default (#5575 ) * verbose and tag_map options * adding init_tok2vec option and only changing the tok2vec that is specified * adding omit_extra_lookups and verifying textcat config * wip * pretrain bugfix * add replace and resume options * train_textcat fix * raw text functionality * improve UX when KeyError or when input data can't be parsed * avoid unnecessary access to goldparse in TextCat pipe * save performance information in nlp.meta * add noise_level to config * move nn_parser's defaults to config file * multitask in config - doesn't work yet * scorer offering both F and AUC options, need to be specified in config * add textcat verification code from old train script * small fixes to config files * clean up * set default config for ner/parser to allow create_pipe to work as before * two more test fixes * small fixes * cleanup * fix NER pickling + additional unit test * create_pipe as before	2020-06-12 02:02:07 +02:00
Ines Montani	810fce3bb1	Merge branch 'develop' into master-tmp	2020-06-03 14:36:59 +02:00
Ines Montani	262d306eaa	unicode -> str consistency	2020-05-24 17:23:00 +02:00
Ines Montani	5d3806e059	unicode -> str consistency	2020-05-24 17:20:58 +02:00
Ines Montani	24f72c669c	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
Sofie Van Landeghem	0d94737857	Feature toggle_pipes (#5378 ) * make disable_pipes deprecated in favour of the new toggle_pipes * rewrite disable_pipes statements * update documentation * remove bin/wiki_entity_linking folder * one more fix * remove deprecated link to documentation * few more doc fixes * add note about name change to the docs * restore original disable_pipes * small fixes * fix typo * fix error number to W096 * rename to select_pipes * also make changes to the documentation Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-05-18 22:27:10 +02:00
Ines Montani	f333c2a011	Merge pull request #5386 from svlandeg/fix/nel-docs	2020-05-10 12:00:09 +02:00
adrianeboyd	4a15b559ba	Clarify Token.pos as UPOS (#5419 )	2020-05-08 10:36:25 +02:00
adrianeboyd	a2345618f1	Fix Token API docs from #5375 (#5418 )	2020-05-08 10:25:02 +02:00
svlandeg	ebaed7dcfa	Few more updates to the EL documentation	2020-04-30 10:17:06 +02:00
adrianeboyd	bdff76dede	Various updates/additions to CLI scripts (#5362 ) * `debug-data`: determine coverage of provided vectors * `evaluate`: support `blank:lg` model to make it possible to just evaluate tokenization * `init-model`: add option to truncate vectors to N most frequent vectors from word2vec file * `train`: * if training on GPU, only run evaluation/timing on CPU in the first iteration * if training is aborted, exit with a non-0 exit status	2020-04-29 12:56:46 +02:00
Sofie Van Landeghem	cfdaf99b80	Fix passing of component configuration (#5374 ) * add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument * add fix and test for Issue 5137	2020-04-29 12:56:17 +02:00
Sofie Van Landeghem	f67343295d	Update NEL examples and documentation (#5370 ) * simplify creation of KB by skipping dim reduction * small fixes to train EL example script * add KB creation and NEL training example scripts to example section * update descriptions of example scripts in the documentation * moving wiki_entity_linking folder from bin to projects * remove test for wiki NEL functionality that is being moved	2020-04-29 12:53:53 +02:00
adrianeboyd	a6e521cd79	Add is_sent_end token property (#5375 ) Reconstruction of the original PR #4697 by @MiniLau. Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema because the Matcher is only going to be able to support `IS_SENT_START`.	2020-04-29 12:53:16 +02:00
adrianeboyd	90ce34db42	Add cuda101 and cuda102 options to setup (#5377 ) * Add cuda101 and cuda102 options to setup * Update cudaNNN options in docs	2020-04-29 12:51:12 +02:00
adrianeboyd	792aa7b6ab	Remove references to textcat spans (#5360 ) Remove references to unimplemented `TextCategorizer` span labels in `GoldParse` and `Doc`.	2020-04-27 18:01:12 +02:00
adrianeboyd	90c754024f	Update nlp.vectors to nlp.vocab.vectors (#5357 )	2020-04-27 10:53:05 +02:00
Mike	481574cbc8	[minor doc change] embedding vis. link is broken in `website/docs/usage/examples.md` (#5325 ) * The embedding vis. link is broken The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share? * contributor agreement * Update Mlawrence95.md * Update website/docs/usage/examples.md Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-04-21 20:35:12 +02:00
laszabine	fb73d4943a	Amend documentation to Language.evaluate (#5319 ) * Specified usage of arguments to Language.evaluate * Created contributor agreement	2020-04-16 20:00:18 +02:00
Sofie Van Landeghem	a3965ec13d	tag-map-path since 2.2.4 instead of 2.2.3 (#5289 )	2020-04-14 14:53:47 +02:00
Marek Grzenkowicz	6a8a52650f	[Closes #5292 ] Fix typo in option name "--n-save_every" (#5293 ) * Sign contributor agreement for chopeen * Fix typo in option name and close #5292	2020-04-11 23:35:01 +02:00
Sofie Van Landeghem	1137420840	Small doc fixes (#5250 ) * fix link * torchtext instead tochtext	2020-04-03 13:01:43 +02:00
Nikhil Saldanha	d1ddfa1cb7	update docs for EntityRecognizer.predict return type was wrongly written as a tuple, changed to syntax.StateClass	2020-03-28 18:13:02 +01:00
Sofie Van Landeghem	9b412516e7	Fixing pickling of the parser (#5218 ) * fix __reduce__ for pickling parser * setting the move object as 'state' during pickling * unskip test_issue4725 - works again	2020-03-27 19:35:26 +01:00
Ines Montani	46568f40a7	Merge branch 'master' into tmp/sync	2020-03-26 13:38:14 +01:00
Tiljander	e53232533b	Describing priority rules for overlapping matches (#5197 ) * Describing priority rules for overlapping matches * Create Tiljander.md * Describing priority rules for overlapping matches * Update website/docs/api/entityruler.md Co-Authored-By: Ines Montani <ines@ines.io> Co-authored-by: Ines Montani <ines@ines.io>	2020-03-26 13:13:22 +01:00
adrianeboyd	d88a377bed	Remove Vectors.from_glove (#5209 )	2020-03-26 10:45:47 +01:00
Ines Montani	17bd9ed84f	Merge pull request #5153 from pinealan/fix/website-docs Fix website typos and weird sentences	2020-03-16 15:03:01 +01:00
Alan Chan	2124be100d	Tweak run-on sentence	2020-03-15 03:45:20 +08:00
Alan Chan	7c3a4ce933	Missing word in api/cli doc	2020-03-15 03:45:20 +08:00
Alan Chan	36e3532475	Remove unfinished sentence	2020-03-15 03:45:17 +08:00
Mark Abraham	a0ffa346c0	Fix broken link in docs	2020-03-13 14:07:26 +01:00
Ines Montani	c669435c62	Merge pull request #5125 from renaud/patch-1 small typo in code sample	2020-03-12 11:19:12 +01:00
svlandeg	1724a4f75b	additional information if doc is empty	2020-03-09 18:08:18 +01:00
Renaud Richardet	eccf6b1686	small typo in code sample	2020-03-09 14:49:11 +01:00
Ines Montani	1d6aec805d	Fix formatting and update docs for v2.2.4	2020-03-09 11:17:20 +01:00
Ines Montani	acb4e3c7ba	Merge pull request #5039 from adrianeboyd/typo/website-token-api-shape Fix formatting in Token API	2020-02-25 14:57:25 +01:00
Sofie Van Landeghem	479bd8d09f	add lemma option to displacy 'dep' visualiser (#5041 ) * add lemma option to displacy 'dep' visualiser * more compact list comprehension * add option to doc * fix test and add lemmas to util.get_doc * fix capital * remove lemma from get_doc * cleanup	2020-02-22 14:11:51 +01:00
Adriane Boyd	3853d385fa	Fix formatting in Token API	2020-02-20 13:41:24 +01:00
Ines Montani	de11ea753a	Merge branch 'master' into develop	2020-02-18 14:47:23 +01:00
Kabir Khan	f6ed07b85c	Use nlp.pipe in EntityRuler for phrase patterns in add_patterns (#4931 ) * Fix ent_ids and labels properties when id attribute used in patterns * use set for labels * sort end_ids for comparison in entity_ruler tests * fixing entity_ruler ent_ids test * add to set * Run make_doc optimistically if using phrase matcher patterns. * remove unused coveragerc I was testing with * format * Refactor EntityRuler.add_patterns to use nlp.pipe for phrase patterns. Improves speed substantially. * Removing old add_patterns function * Fixing spacing * Make sure token_patterns loaded as well, before generator was being emptied in from_disk	2020-02-16 18:17:47 +01:00
Julin S	479e81bafc	fix link (#4977 )	2020-02-10 20:31:26 -05:00
Ines Montani	9c08d9baa3	Remove old sections [ci skip] (closes #4961 )	2020-02-03 13:10:46 +01:00
Ines Montani	abd5c06374	Adjust formatting [ci skip]	2020-02-03 13:00:02 +01:00
Martin A. Kayser	02a44c5be2	Adding a note on retrieving the string rep of the match_id (#4904 ) Stolen from here: https://stackoverflow.com/questions/47638877/using-phrasematcher-in-spacy-to-find-multiple-match-types	2020-02-03 12:58:58 +01:00
adrianeboyd	7ad000fce7	Update docs for train CLI --use_gpu option (#4927 )	2020-01-20 17:02:47 +01:00
Preston Badeer	b216ff43c9	Update vectors-similarity.md (#4889 ) These links are broken on the website, due to quotes around the URLs.	2020-01-08 16:49:40 +01:00
Geoffrey Gordon Ashbrook	53929138d7	remove extra word typo (#4875 ) "let you find you"	2020-01-06 12:37:42 +01:00
Ines Montani	400257a802	Update index.md [ci skip]	2020-01-04 01:52:18 +01:00
Ivan Echevarria	ef13e0c038	Add n_process to Language.pipe documentation (#4842 ) [ci skip] * Add n_process to documentation * Auto-format and add default [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2019-12-29 14:23:33 +01:00

1 2 3 4 5 ...

755 Commits