spaCy

mirror of https://github.com/explosion/spaCy.git synced 2025-04-04 09:14:12 +03:00

Author	SHA1	Message	Date
Ines Montani	52728d8fa3	Merge branch 'develop' into master-tmp	2020-06-20 15:52:00 +02:00
Adriane Boyd	931d80de72	Warning for sudachipy 0.4.5 (#5611 )	2020-06-19 12:43:41 +02:00
Ines Montani	6d712f3e06	Merge pull request #5599 from adrianeboyd/docs/v2.3.0-minor	2020-06-16 13:49:25 -07:00
Adriane Boyd	02369f91d3	Fix spacy convert argument	2020-06-16 20:41:17 +02:00
Adriane Boyd	f0fd77648f	Change example title to Dr. Change example title to Dr. so the current model does exclude the title in the initial example.	2020-06-16 20:36:21 +02:00
Adriane Boyd	a6abdfbc3c	Fix numpy.zeros() dtype for Doc.from_array	2020-06-16 20:35:45 +02:00
Adriane Boyd	9aff317ca7	Update POS in tagging example	2020-06-16 20:26:57 +02:00
Adriane Boyd	457babfa0c	Update alignment example for new gold.align	2020-06-16 20:22:03 +02:00
Ines Montani	41003a5117	Update Binder version [ci skip]	2020-06-16 17:41:23 +02:00
Ines Montani	fd89f44c0c	Update Binder URL [ci skip]	2020-06-16 17:34:26 +02:00
Ines Montani	44af53bdd9	Add pkuseg warnings and auto-format [ci skip]	2020-06-16 17:13:35 +02:00
Ines Montani	a9e5b840ee	Fix typos and auto-format [ci skip]	2020-06-16 16:38:45 +02:00
Ines Montani	e9d3e177f0	Merge branch 'master' into v2.3.x	2020-06-16 16:31:38 +02:00
Ines Montani	bb54f54369	Fix model accuracy table [ci skip]	2020-06-16 16:10:12 +02:00
Adriane Boyd	d5110ffbf2	Documentation updates for v2.3.0 (#5593 ) * Update website models for v2.3.0 * Add docs for Chinese word segmentation * Tighten up Chinese docs section * Merge branch 'master' into docs/v2.3.0 [ci skip] * Merge branch 'master' into docs/v2.3.0 [ci skip] * Auto-format and update version * Update matcher.md * Update languages and sorting * Typo in landing page * Infobox about token_match behavior * Add meta and basic docs for Japanese * POS -> TAG in models table * Add info about lookups for normalization * Updates to API docs for v2.3 * Update adding norm exceptions for adding languages * Add --omit-extra-lookups to CLI API docs * Add initial draft of "What's New in v2.3" * Add new in v2.3 tags to Chinese and Japanese sections * Add tokenizer to migration section * Add new in v2.3 flags to init-model * Typo * More what's new in v2.3 Co-authored-by: Ines Montani <ines@ines.io>	2020-06-16 15:37:35 +02:00
Sofie Van Landeghem	c0f4a1e43b	train is from-config by default (#5575 ) * verbose and tag_map options * adding init_tok2vec option and only changing the tok2vec that is specified * adding omit_extra_lookups and verifying textcat config * wip * pretrain bugfix * add replace and resume options * train_textcat fix * raw text functionality * improve UX when KeyError or when input data can't be parsed * avoid unnecessary access to goldparse in TextCat pipe * save performance information in nlp.meta * add noise_level to config * move nn_parser's defaults to config file * multitask in config - doesn't work yet * scorer offering both F and AUC options, need to be specified in config * add textcat verification code from old train script * small fixes to config files * clean up * set default config for ner/parser to allow create_pipe to work as before * two more test fixes * small fixes * cleanup * fix NER pickling + additional unit test * create_pipe as before	2020-06-12 02:02:07 +02:00
Martino Mensio	de00f967ce	adding spacy-universal-sentence-encoder (#5534 ) * adding spacy-universal-sentence-encoder * update affiliation * updated code example	2020-06-08 20:26:30 +02:00
Sofie Van Landeghem	4d1ba6feb4	add tag variant for 2.3 (#5542 )	2020-06-04 19:16:33 +02:00
Ines Montani	810fce3bb1	Merge branch 'develop' into master-tmp	2020-06-03 14:36:59 +02:00
svlandeg	5f0a91cf37	fix conv-depth parameter	2020-05-29 09:56:29 +02:00
Rajat	8b8efa1b42	update spacy universe with my project (#5497 ) * added contextualSpellCheck in spacy universe meta * removed extra formatting by code * updated with permanent links * run json linter used by spacy * filled SCA * updated the description	2020-05-25 11:30:23 +02:00
Ines Montani	262d306eaa	unicode -> str consistency	2020-05-24 17:23:00 +02:00
Ines Montani	5d3806e059	unicode -> str consistency	2020-05-24 17:20:58 +02:00
Sofie Van Landeghem	ae1c179f3a	Remove the nested quote	2020-05-23 17:58:19 +02:00
Jannis	aa53ce6996	Documentation Typo Fix (#5492 ) * Fix typo Change 'realize' to 'realise' * Add contributer agreement	2020-05-22 19:50:26 +02:00
Matthew Honnibal	f6078d866a	Merge pull request #5121 from adrianeboyd/bugfix/revert-token-match Revert token_match priority changes from #4374 and extend token match options	2020-05-22 14:42:51 +02:00
Ines Montani	65c7e82de2	Auto-format and remove 2.3 feature [ci skip]	2020-05-22 13:50:30 +02:00
Adriane Boyd	e4a1b5dab1	Rename to url_match Rename to `url_match` and update docs.	2020-05-22 12:41:03 +02:00
Adriane Boyd	730fa493a4	Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match	2020-05-22 12:18:00 +02:00
Ines Montani	ee027de032	Update universe and display of videos [ci skip]	2020-05-21 21:54:23 +02:00
Ines Montani	53da6bd672	Add course to landing [ci skip]	2020-05-21 20:45:33 +02:00
Ines Montani	24f72c669c	Merge branch 'develop' into master-tmp	2020-05-21 18:39:06 +02:00
Kevin Lu	c7c4cd5fe1	Changed pyate code example in universe.json	2020-05-20 09:11:32 -07:00
Kevin Lu	0a5b140235	Update universe.json	2020-05-19 20:12:21 -07:00
Sofie Van Landeghem	0d94737857	Feature toggle_pipes (#5378 ) * make disable_pipes deprecated in favour of the new toggle_pipes * rewrite disable_pipes statements * update documentation * remove bin/wiki_entity_linking folder * one more fix * remove deprecated link to documentation * few more doc fixes * add note about name change to the docs * restore original disable_pipes * small fixes * fix typo * fix error number to W096 * rename to select_pipes * also make changes to the documentation Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>	2020-05-18 22:27:10 +02:00
Ines Montani	f333c2a011	Merge pull request #5386 from svlandeg/fix/nel-docs	2020-05-10 12:00:09 +02:00
Travis Hoppe	d4cc18b746	Added author information for NLPre (#5414 ) * Add author links for NLPre and update category * Add contributor statement	2020-05-08 11:28:54 +02:00
adrianeboyd	4a15b559ba	Clarify Token.pos as UPOS (#5419 )	2020-05-08 10:36:25 +02:00
adrianeboyd	a2345618f1	Fix Token API docs from #5375 (#5418 )	2020-05-08 10:25:02 +02:00
Adriane Boyd	565e0eef73	Add tokenizer option for token match with affixes To fix the slow tokenizer URL (#4374) and allow `token_match` to take priority over prefixes and suffixes by default, introduce a new tokenizer option for a token match pattern that's applied after prefixes and suffixes but before infixes.	2020-05-05 10:35:33 +02:00
Adriane Boyd	792c8af8cf	Merge remote-tracking branch 'upstream/master' into bugfix/revert-token-match	2020-05-05 09:25:57 +02:00
svlandeg	ebaed7dcfa	Few more updates to the EL documentation	2020-04-30 10:17:06 +02:00
adrianeboyd	bdff76dede	Various updates/additions to CLI scripts (#5362 ) * `debug-data`: determine coverage of provided vectors * `evaluate`: support `blank:lg` model to make it possible to just evaluate tokenization * `init-model`: add option to truncate vectors to N most frequent vectors from word2vec file * `train`: * if training on GPU, only run evaluation/timing on CPU in the first iteration * if training is aborted, exit with a non-0 exit status	2020-04-29 12:56:46 +02:00
Sofie Van Landeghem	cfdaf99b80	Fix passing of component configuration (#5374 ) * add kwargs to to_disk methods in docs - otherwise crashes on 'exclude' argument * add fix and test for Issue 5137	2020-04-29 12:56:17 +02:00
Ines Montani	63885c1836	Remove u string and auto-format [ci skip]	2020-04-29 12:54:57 +02:00
Sofie Van Landeghem	f67343295d	Update NEL examples and documentation (#5370 ) * simplify creation of KB by skipping dim reduction * small fixes to train EL example script * add KB creation and NEL training example scripts to example section * update descriptions of example scripts in the documentation * moving wiki_entity_linking folder from bin to projects * remove test for wiki NEL functionality that is being moved	2020-04-29 12:53:53 +02:00
adrianeboyd	a6e521cd79	Add is_sent_end token property (#5375 ) Reconstruction of the original PR #4697 by @MiniLau. Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema because the Matcher is only going to be able to support `IS_SENT_START`.	2020-04-29 12:53:16 +02:00
Ines Montani	a77754120d	Merge pull request #5177 from nlptechbook/patch-5	2020-04-29 12:52:21 +02:00
Ines Montani	1cbb272a6b	Update website/meta/universe.json	2020-04-29 12:51:44 +02:00
Ines Montani	732629b0dd	Update website/meta/universe.json	2020-04-29 12:51:37 +02:00

1 2 3 4 5 ...

1680 Commits