spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-11-10 19:57:17 +03:00

Author	SHA1	Message	Date
Shashank	450720aca2	Added support for Sanskrit language (#5956 ) * Added support for Sanskrit language * Added tests for lexical attribute like_num	2020-08-25 10:56:29 +02:00
idoshr	b10c7bc56e	Hebrew like num (#5952 ) * Update stop_words.py Hebrew STOP WORDS * Update stop_words.py * contributor * contributor * add some common domain extentions support human number 1K/1M.... * support human number 1K/1M.... * hebrew number tokenize 1K/1M implement in EN * test human tokenize fix * test * heb like num revert human number change * heb like num	2020-08-24 14:30:05 +02:00
Attila Szász	669dc70822	Create tilusnet.md (#5914 )	2020-08-12 22:46:08 +02:00
Adam Bittlingmayer	7b33b2854f	Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct (#5910 ) * Add Armenian sentence-final verchaket * Add Greek and Arabic question marks, and contributor agreement * Check box	2020-08-12 15:36:14 +02:00
graue70	49e690bde1	Fix typos in comments (#5904 ) * Fix typo in comment * Fix typo * Add spaCy Contributor Agreement	2020-08-12 15:35:25 +02:00
holubvl3	d16c0f2c3a	Create holubvl3 (#5845 ) * Create holubvl3 * Rename holubvl3 to holubvl3.md Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-07-30 17:40:31 +02:00
Gustavo Zadrozny Leyendecker	90b958fd01	Fix on EntityRendered to support break lines (after last entity) (closes #5838 )	2020-07-29 18:48:39 +02:00
Li Zhe	a69eb445dc	fix the wrong hash url in adding-languages.md file (#5810 ) * fix the wrong hash url in adding-languages.md file change the #101 url hash path to #language-data * filled in the spaCy Contributor Agreement filled in the spaCy Contributor Agreement	2020-07-25 13:13:38 +02:00
Joshua Olson	6d4d5c074c	Mark Japanese documents as tagged. (#5803 ) Mark the document as tagged before returning it to the user from the JapaneseTokenizer. Fixes #5802	2020-07-23 08:57:01 +02:00
Alec Chapman	a8978ca285	Add VA COVID-19 NLP project to spaCy Universe (#5777 ) * Update universe.json Add cov-bsv to "resources" * Update universe.json * add contributor agreement	2020-07-19 13:35:31 +02:00
gandersen101	9097549227	Adding spaczz package to universe.json (#5717 ) * Adding spaczz package to universe.json * Adding contributor agreement.	2020-07-07 20:55:24 +02:00
Jonathan Besomi	546f3d10d4	Add texthero to universe.json (#5716 ) * Add texthero to universe.json * Add spaCy contributor Agreement	2020-07-07 20:54:22 +02:00
Mike Izbicki	7a2ca00794	fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab (#5701 ) * speed up Korean nlp 100x by stopping mecab from reloading on each doc * add contributor agreement * rename variables to improve code readability	2020-07-06 17:03:33 +02:00
Matthias Hertel	8b0f749606	Website: fixed the token span in the text about the rule-based matching example (#5669 ) * fixed token span in pattern matcher example * contributor agreement	2020-06-30 19:58:23 +02:00
PluieElectrique	90c7eb0e2f	Reduce memory usage of Lookup's BloomFilter (#5606 ) * Reduce memory usage of Lookup's BloomFilter * Remove extra Table update	2020-06-26 14:09:10 +02:00
Richard Liaw	0ef78bad93	contribute (#5632 )	2020-06-23 08:53:58 +02:00
Rameshh	c34420794a	Add Nepali Language (#5622 ) * added support for nepali lang * added examples and test files * added spacy contributor agreement	2020-06-22 10:25:46 +02:00
Karen Hambardzumyan	ff6a084e9c	Create mahnerak.md (#5615 )	2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan	ccd7edf04b	Create myavrum.md (#5612 )	2020-06-19 18:34:27 +02:00
Arvind Srinivasan	aa5b40fa64	Added Tamil Example Sentences (#5583 ) * Added Examples for Tamil Sentences #### Description This PR add example sentences for the Tamil language which were missing as per issue #1107 #### Type of Change This is an enhancement. * Accepting spaCy Contributor Agreement * Signed on my behalf as an individual	2020-06-13 15:56:26 +02:00
theudas	fa46e0bef2	Added Parameter to NEL to take n sentences into account (#5548 ) * added setting for neighbour sentence in NEL * added spaCy contributor agreement * added multi sentence also for training * made the try-except block smaller	2020-06-12 02:03:23 +02:00
Sofie Van Landeghem	18c6dc8093	removing label both on comment and on close	2020-06-11 14:09:40 +02:00
Jones Martins	28db7dd5d9	Add missing pronoums/determiners (#5569 ) * Add missing pronoums/determiners * Add test for missing pronoums * Add contributor file	2020-06-10 18:47:04 +02:00
Sofie Van Landeghem	12c1965070	set delay to 7 days	2020-06-10 10:46:12 +02:00
Sofie Van Landeghem	86112d2168	update issue manager's version	2020-06-09 08:57:38 +02:00
Martino Mensio	de00f967ce	adding spacy-universal-sentence-encoder (#5534 ) * adding spacy-universal-sentence-encoder * update affiliation * updated code example	2020-06-08 20:26:30 +02:00
Sofie Van Landeghem	d1799da200	bot for answered issues (#5563 ) * add tiangolo's issue manager * fix formatting * spaces, tabs, who knows * formatting * I'll get this right at some point * maybe one more space ?	2020-06-08 19:47:32 +02:00
Hiroshi Matsuda	456bf47f51	fix a bug causing mis-alignments (#5560 )	2020-06-08 15:49:34 +02:00
Leo	7d5a89661e	contributor agreement signed (#5525 )	2020-05-31 20:13:39 +02:00
Rajat	8b8efa1b42	update spacy universe with my project (#5497 ) * added contextualSpellCheck in spacy universe meta * removed extra formatting by code * updated with permanent links * run json linter used by spacy * filled SCA * updated the description	2020-05-25 11:30:23 +02:00
Jannis	aa53ce6996	Documentation Typo Fix (#5492 ) * Fix typo Change 'realize' to 'realise' * Add contributer agreement	2020-05-22 19:50:26 +02:00
Matthew Honnibal	93c4d13588	Merge pull request #5264 from lfiedler/issue-5230 Fix ResourceWarnings during unittest	2020-05-22 00:31:07 +02:00
Kevin Lu	291b9ad7b9	Update CONTRIBUTOR_AGREEMENT.md	2020-05-19 20:29:53 -07:00
Kevin Lu	9a1a535215	Create kevinlu1248.md	2020-05-19 20:25:45 -07:00
Kevin Lu	a23b3a5a50	Update CONTRIBUTOR_AGREEMENT.md	2020-05-19 20:24:24 -07:00
Ines Montani	a41e28ceba	Merge pull request #5436 from ilivans/fix_errors_with_codes	2020-05-18 10:45:56 +02:00
Ilkyu Ju	72a25c9cef	Very minor issues in Korean example sentences (#5446 ) * Add contributor agreement * Improve ko translation of example sentences I fixed unnatural translations and word spacing errors. * Update osori.md	2020-05-17 13:43:34 +02:00
Ilia Ivanov	ee8fe37474	Add ilivans' contributor agreement	2020-05-14 15:59:06 +02:00
Vishnu Priya VR	9ce059dd06	Limiting noun_chunks for specific languages (#5396 ) * Limiting noun_chunks for specific langauges * Limiting noun_chunks for specific languages Contributor Agreement * Addressing review comments * Removed unused fixtures and imports * Add fa_tokenizer in test suite * Use fa_tokenizer in test * Undo extraneous reformatting Co-authored-by: adrianeboyd <adrianeboyd@gmail.com>	2020-05-14 12:58:06 +02:00
Travis Hoppe	d4cc18b746	Added author information for NLPre (#5414 ) * Add author links for NLPre and update category * Add contributor statement	2020-05-08 11:28:54 +02:00
Samuel Rodríguez Medina	8602daba85	Swedish like_num (#5371 ) * Sign contributor agreement. * Add like_num functionality to Swedish. * Update spacy/tests/lang/sv/test_lex_attrs.py Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update contributor agreement Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-04-29 21:25:22 +02:00
adrianeboyd	a6e521cd79	Add is_sent_end token property (#5375 ) Reconstruction of the original PR #4697 by @MiniLau. Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema because the Matcher is only going to be able to support `IS_SENT_START`.	2020-04-29 12:53:16 +02:00
Louis Guitton	a27c4014f5	Add mlflow to spaCy universe (#5352 ) * Add mlflow to universe * Use mlflow black logo	2020-04-29 10:18:03 +02:00
Michael	5b5528ff2e	Add `!=3.4.*` to python_requires (#5344 ) Missed in `80d554f2e2`	2020-04-27 22:02:09 +02:00
Punitvara	b2b7e1f37a	This PR adds Gujarati Language class along with (#5355 ) * This PR adds Gujarati Language class along with - stop words * Add test for gu tokenizer	2020-04-27 11:07:37 +02:00
sabiqueqb	fc91660aa2	Gh 5339 language class for malayalam (#5342 ) * Initialize Malayalam Language class * Add lex_attrs and examples for Malayalam * Add spaCy Contributor Agreement * Add test for ml tokenizer	2020-04-27 09:45:08 +02:00
Mike	481574cbc8	[minor doc change] embedding vis. link is broken in `website/docs/usage/examples.md` (#5325 ) * The embedding vis. link is broken The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share? * contributor agreement * Update Mlawrence95.md * Update website/docs/usage/examples.md Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-04-21 20:35:12 +02:00
laszabine	fb73d4943a	Amend documentation to Language.evaluate (#5319 ) * Specified usage of arguments to Language.evaluate * Created contributor agreement	2020-04-16 20:00:18 +02:00
Jakob Jul Elben	663333c3b2	Fixes #5413 (#5315 ) * Fix 5314 * Add contributor * Resolve requested changes Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>	2020-04-16 13:29:02 +02:00
Sébastien Harinck	dac70f29eb	contrib: add contributor agreement for user sebastienharinck (#5316 )	2020-04-16 11:32:09 +02:00

1 2 3 4 5 ...

400 Commits