spaCy

mirror of https://github.com/explosion/spaCy.git synced 2026-02-17 12:40:46 +03:00

Author	SHA1	Message	Date
Sofie Van Landeghem	87562e470d	fix backticks in docs (#6635 )	2020-12-27 22:12:37 +01:00
Sofie Van Landeghem	8df5b7f513	fix documentation of 'path' in tokenizer.to_disk (#6634 )	2020-12-27 22:01:06 +01:00
Gareth Sparks	efc229c3f4	Doc.char_span arg: alignment_mode (#6591 ) Currently labeled "mode", actually "alignment_mode"	2020-12-18 09:54:56 +01:00
Jeno Pizarro	a6fe35a0f9	Update universe.json	2020-12-15 21:53:20 -05:00
Jeno Pizarro	343a44abe9	Merge branch 'master' of https://github.com/explosion/spaCy	2020-12-15 21:49:46 -05:00
Ines Montani	fb43a30a71	Merge pull request #6545 from svlandeg/feature/discussions [ci skip]	2020-12-11 10:20:35 +11:00
Ines Montani	76cfd89dea	Update site.json	2020-12-11 10:19:42 +11:00
Ines Montani	43a69eecb7	Update site.json	2020-12-11 10:05:21 +11:00
svlandeg	d156b423ae	remove gitter and reddit links	2020-12-10 20:41:02 +01:00
svlandeg	5afa567767	replace gitter with discussions in 101	2020-12-10 20:17:36 +01:00
svlandeg	ae1ccf2b04	update link to discussion forum	2020-12-10 20:02:49 +01:00
Adriane Boyd	27bb75e2a0	Docs and extras updates for v2.3.5 * Update install instructions for updated packages * Add `cuda110` and `cuda111` extras, remove upper `cupy` pins (only compatible with `thinc>=7.4.4`)	2020-12-10 15:34:34 +01:00
Adriane Boyd	5ceac425ee	Remove non-working --use-chars from train CLI Remove the non-working `--use-chars` option from the train CLI. The implementation of the option across component types and the CLI settings could be fixed, but the `CharacterEmbed` model does not work on GPU in v2 so it's better to remove it.	2020-12-08 08:30:00 +01:00
Adriane Boyd	03ae77e603	Add SPACY as a Matcher attribute (#6463 )	2020-11-30 09:34:50 +08:00
Jacob Bortell	fe9009911a	Update rule-based-matching.md (#6421 ) * Update rule-based-matching.md Clarified case-sensititivy of dictionary-referencing attributes (POS/TAG/DEP/etc). Clarified "Type" column header to "Value Type" * Update rule-based-matching.md Improved clarity of wording	2020-11-24 16:20:19 +01:00
Yusuke Mori	e3ac90b035	Avoid a SyntaxError in self-attentive-parser (#6428 ) * Avoid a SyntaxError in self-attentive-parser Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser * Create forest1988.md Fill in the spaCy contributor agreement	2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa	51232ffb9e	Update universe.json (include PatternOmatic) (#6399 ) Request to include PatternOmatic in spaCy Universe Adds @revuel to contributors	2020-11-19 13:15:50 +01:00
Adriane Boyd	3cf6479467	Fix JSON in #6395	2020-11-17 15:25:41 +01:00
Sam Edwardes	78913a4f95	Added spaCyTextBlob to universe.json (#6395 )	2020-11-17 14:38:34 +01:00
Ines Montani	4d337eedf2	Merge pull request #6322 from medspacy/master	2020-11-10 02:47:29 +01:00
Adriane Boyd	8644ee3e3f	Update TIGER link and tag description (#6344 )	2020-11-05 09:33:00 +01:00
Alec Chapman	204c7c8a00	fix thumbnail link to be github raw url	2020-11-01 07:53:48 -07:00
Alec Chapman	73d22d96ff	add medspacy to universe and fix example w/ cov-bsv	2020-10-29 07:53:56 -06:00
Adriane Boyd	8cc5ed6771	Add Macedonian to website languages	2020-10-29 08:49:56 +01:00
Adriane Boyd	4dd86306e9	Add Nepali to supported languages on website (#6315 )	2020-10-28 16:32:07 +01:00
Kunal Sharma	01aec7a313	Adding MindMeld to Universe JSON (#6275 ) * Adding Mindmeld to Universe JSON Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/ * Signing contribution agreement. Co-authored-by: kunshar2 <kunshar2@cisco.com>	2020-10-21 18:42:11 +02:00
Ines Montani	3851300e80	Update landing [ci skip]	2020-10-16 11:46:33 +02:00
delzac	15ea401b39	Reflect on usage doc that IS_SENT_START attribute exist (#6114 ) * Reflect on usage doc that IS_SENT_START attribute exist * Create delzac.md	2020-10-06 15:11:01 +02:00
Šarūnas Navickas	047fb9f8b8	Website (Universe): An entry for rita-dsl (#6138 ) * Create zaibacu.md * Add RITA-DSL entry * Update agreement * Fix formatting	2020-10-06 11:19:36 +02:00
Ines Montani	27c5795ea5	Fix version check in models directory [ci skip]	2020-09-25 09:23:29 +02:00
Marek Grzenkowicz	a26f864ed3	Clarify how to choose pretrained weights files (closes #6027 ) [ci skip] (#6039 )	2020-09-08 21:13:50 +02:00
Ines Montani	33d9c64977	Fix outbound link and update package lock [ci skip]	2020-09-04 14:44:38 +02:00
Ines Montani	ba6cf9821f	Replace docs analytics [ci skip]	2020-09-04 14:28:28 +02:00
Brad Jascob	2160aafec6	Updates spaCy Universe for amrlib (#6020 ) * Updates spaCy Universe for amrlib * Updates to doc based on feedback	2020-09-04 10:03:35 +02:00
Juan Gutiérrez	9002bea29f	Update suffixes example (#5989 ) * Update suffixes example The current example will throw `TypeError: can only concatenate list (not "tuple") to list` * Signing Contributor Agreement	2020-08-31 12:44:56 +02:00
Bram Vanroy	9e45d064bb	Update universe details spacy_conll (#5871 )	2020-08-05 14:34:12 +02:00
Adriane Boyd	c62fd878a3	Allow Doc.char_span to snap to token boundaries (#5849 ) * Allow Doc.char_span to snap to token boundaries Add a `mode` option to allow `Doc.char_span` to snap to token boundaries. The `mode` options: * `strict`: character offsets must match token boundaries (default, same as before) * `inside`: all tokens completely within the character span * `outside`: all tokens at least partially covered by the character span Add a new helper function `token_by_char` that returns the token corresponding to a character position in the text. Update `token_by_start` and `token_by_end` to use `token_by_char` for more efficient searching. * Remove unused import * Rename mode to alignment_mode Rename `mode` to `alignment_mode` with the options `strict`/`contract`/`expand`. Any unrecognized modes are silently converted to `strict`.	2020-08-04 13:36:32 +02:00
Adriane Boyd	2880d8a555	Normalize spelling for spaCy (#5822 )	2020-07-27 10:09:33 +02:00
Martino Mensio	2f6b8132ef	Sentence transformers added to spaCy universe (#5814 ) * fix details for spacy-universal-sentence-encoder * added sentence-transformers	2020-07-27 09:44:33 +02:00
Nipun Sadvilkar	a66ad89fcb	✏️ typo in pysbd code example (#5821 )	2020-07-27 09:43:39 +02:00
Li Zhe	a69eb445dc	fix the wrong hash url in adding-languages.md file (#5810 ) * fix the wrong hash url in adding-languages.md file change the #101 url hash path to #language-data * filled in the spaCy Contributor Agreement filled in the spaCy Contributor Agreement	2020-07-25 13:13:38 +02:00
Alec Chapman	a8978ca285	Add VA COVID-19 NLP project to spaCy Universe (#5777 ) * Update universe.json Add cov-bsv to "resources" * Update universe.json * add contributor agreement	2020-07-19 13:35:31 +02:00
Adriane Boyd	cd5af72c9a	Update pkuseg version (#5774 ) * Update pkuseg version in Chinese tokenizer warnings * Update pkuseg version in `Makefile` * Remove warning about python3.8 wheels in docs	2020-07-19 11:09:49 +02:00
Ines Montani	6f4e4aceb3	Add Plausible [ci skip]	2020-07-18 23:50:29 +02:00
gandersen101	893133873d	Fix quote issue in spaczz universe.json	2020-07-07 19:16:28 -05:00
Ines Montani	109849bd31	Fix and update universe.json [ci skip]	2020-07-07 21:12:28 +02:00
gandersen101	9097549227	Adding spaczz package to universe.json (#5717 ) * Adding spaczz package to universe.json * Adding contributor agreement.	2020-07-07 20:55:24 +02:00
Jonathan Besomi	546f3d10d4	Add texthero to universe.json (#5716 ) * Add texthero to universe.json * Add spaCy contributor Agreement	2020-07-07 20:54:22 +02:00
Matthew Honnibal	3e78e82a83	Experimental character-based pretraining (#5700 ) * Use cosine loss in Cloze multitask * Fix char_embed for gpu * Call resume_training for base model in train CLI * Fix bilstm_depth default in pretrain command * Implement character-based pretraining objective * Use chars loss in ClozeMultitask * Add method to decode predicted characters * Fix number characters * Rescale gradients for mlm * Fix char embed+vectors in ml * Fix pipes * Fix pretrain args * Move get_characters_loss * Fix import * Fix import * Mention characters loss option in pretrain * Remove broken 'self attention' option in pretrain * Revert "Remove broken 'self attention' option in pretrain" This reverts commit `56b820f6af`. * Document 'characters' objective of pretrain	2020-07-05 15:48:39 +02:00
Álvaro Abella Bascarán	ff0dbe5c64	Fix in docs: pipe(docs) instead of pipe(texts) (#5680 ) Very minor fix in docs, specifically in this part: ``` matcher = PhraseMatcher(nlp.vocab) > for doc in matcher.pipe(texts, batch_size=50): > pass ``` `texts` suggests the input is an iterable of strings. I replaced it for `docs`.	2020-06-30 20:00:50 +02:00

1 2 3 4 5 ...

1722 Commits