spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-27 10:26:35 +03:00

Author	SHA1	Message	Date
Gustavo Zadrozny Leyendecker	90b958fd01	Fix on EntityRendered to support break lines (after last entity) (closes #5838 )	2020-07-29 18:48:39 +02:00
Li Zhe	a69eb445dc	fix the wrong hash url in adding-languages.md file (#5810 ) * fix the wrong hash url in adding-languages.md file change the #101 url hash path to #language-data * filled in the spaCy Contributor Agreement filled in the spaCy Contributor Agreement	2020-07-25 13:13:38 +02:00
Joshua Olson	6d4d5c074c	Mark Japanese documents as tagged. (#5803 ) Mark the document as tagged before returning it to the user from the JapaneseTokenizer. Fixes #5802	2020-07-23 08:57:01 +02:00
Ines Montani	644074b954	Merge branch 'develop' into master-tmp	2020-07-20 14:58:04 +02:00
Alec Chapman	a8978ca285	Add VA COVID-19 NLP project to spaCy Universe (#5777 ) * Update universe.json Add cov-bsv to "resources" * Update universe.json * add contributor agreement	2020-07-19 13:35:31 +02:00
gandersen101	9097549227	Adding spaczz package to universe.json (#5717 ) * Adding spaczz package to universe.json * Adding contributor agreement.	2020-07-07 20:55:24 +02:00
Jonathan Besomi	546f3d10d4	Add texthero to universe.json (#5716 ) * Add texthero to universe.json * Add spaCy contributor Agreement	2020-07-07 20:54:22 +02:00
Mike Izbicki	7a2ca00794	fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab (#5701 ) * speed up Korean nlp 100x by stopping mecab from reloading on each doc * add contributor agreement * rename variables to improve code readability	2020-07-06 17:03:33 +02:00
Sebastián Ramírez	b985cc4025	📄 Add spaCy Contributor Agreement	2020-07-01 20:57:21 +02:00
Ines Montani	414dc7ace1	Merge branch 'spacy.io' into spacy.io-develop	2020-07-01 11:47:47 +02:00
Matthias Hertel	305221f3e5	Website: fixed the token span in the text about the rule-based matching example (#5669 ) * fixed token span in pattern matcher example * contributor agreement	2020-06-30 19:58:55 +02:00
Matthias Hertel	8b0f749606	Website: fixed the token span in the text about the rule-based matching example (#5669 ) * fixed token span in pattern matcher example * contributor agreement	2020-06-30 19:58:23 +02:00
PluieElectrique	90c7eb0e2f	Reduce memory usage of Lookup's BloomFilter (#5606 ) * Reduce memory usage of Lookup's BloomFilter * Remove extra Table update	2020-06-26 14:09:10 +02:00
Richard Liaw	0ef78bad93	contribute (#5632 )	2020-06-23 08:53:58 +02:00
Rameshh	c34420794a	Add Nepali Language (#5622 ) * added support for nepali lang * added examples and test files * added spacy contributor agreement	2020-06-22 10:25:46 +02:00
Karen Hambardzumyan	ff6a084e9c	Create mahnerak.md (#5615 )	2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan	ccd7edf04b	Create myavrum.md (#5612 )	2020-06-19 18:34:27 +02:00
Arvind Srinivasan	aa5b40fa64	Added Tamil Example Sentences (#5583 ) * Added Examples for Tamil Sentences #### Description This PR add example sentences for the Tamil language which were missing as per issue #1107 #### Type of Change This is an enhancement. * Accepting spaCy Contributor Agreement * Signed on my behalf as an individual	2020-06-13 15:56:26 +02:00
theudas	fa46e0bef2	Added Parameter to NEL to take n sentences into account (#5548 ) * added setting for neighbour sentence in NEL * added spaCy contributor agreement * added multi sentence also for training * made the try-except block smaller	2020-06-12 02:03:23 +02:00
Sofie Van Landeghem	18c6dc8093	removing label both on comment and on close	2020-06-11 14:09:40 +02:00
Jones Martins	28db7dd5d9	Add missing pronoums/determiners (#5569 ) * Add missing pronoums/determiners * Add test for missing pronoums * Add contributor file	2020-06-10 18:47:04 +02:00
Sofie Van Landeghem	12c1965070	set delay to 7 days	2020-06-10 10:46:12 +02:00
Sofie Van Landeghem	86112d2168	update issue manager's version	2020-06-09 08:57:38 +02:00
Martino Mensio	de00f967ce	adding spacy-universal-sentence-encoder (#5534 ) * adding spacy-universal-sentence-encoder * update affiliation * updated code example	2020-06-08 20:26:30 +02:00
Sofie Van Landeghem	d1799da200	bot for answered issues (#5563 ) * add tiangolo's issue manager * fix formatting * spaces, tabs, who knows * formatting * I'll get this right at some point * maybe one more space ?	2020-06-08 19:47:32 +02:00
Hiroshi Matsuda	456bf47f51	fix a bug causing mis-alignments (#5560 )	2020-06-08 15:49:34 +02:00
Leo	7d5a89661e	contributor agreement signed (#5525 )	2020-05-31 20:13:39 +02:00
Rajat	8b8efa1b42	update spacy universe with my project (#5497 ) * added contextualSpellCheck in spacy universe meta * removed extra formatting by code * updated with permanent links * run json linter used by spacy * filled SCA * updated the description	2020-05-25 11:30:23 +02:00
Jannis	aa53ce6996	Documentation Typo Fix (#5492 ) * Fix typo Change 'realize' to 'realise' * Add contributer agreement	2020-05-22 19:50:26 +02:00
Matthew Honnibal	93c4d13588	Merge pull request #5264 from lfiedler/issue-5230 Fix ResourceWarnings during unittest	2020-05-22 00:31:07 +02:00
Kevin Lu	291b9ad7b9	Update CONTRIBUTOR_AGREEMENT.md	2020-05-19 20:29:53 -07:00
Kevin Lu	9a1a535215	Create kevinlu1248.md	2020-05-19 20:25:45 -07:00
Kevin Lu	a23b3a5a50	Update CONTRIBUTOR_AGREEMENT.md	2020-05-19 20:24:24 -07:00
Ines Montani	a41e28ceba	Merge pull request #5436 from ilivans/fix_errors_with_codes	2020-05-18 10:45:56 +02:00
Ilkyu Ju	72a25c9cef	Very minor issues in Korean example sentences (#5446 ) * Add contributor agreement * Improve ko translation of example sentences I fixed unnatural translations and word spacing errors. * Update osori.md	2020-05-17 13:43:34 +02:00
Ilia Ivanov	ee8fe37474	Add ilivans' contributor agreement	2020-05-14 15:59:06 +02:00
Vishnu Priya VR	9ce059dd06	Limiting noun_chunks for specific languages (#5396 ) * Limiting noun_chunks for specific langauges * Limiting noun_chunks for specific languages Contributor Agreement * Addressing review comments * Removed unused fixtures and imports * Add fa_tokenizer in test suite * Use fa_tokenizer in test * Undo extraneous reformatting Co-authored-by: adrianeboyd <adrianeboyd@gmail.com>	2020-05-14 12:58:06 +02:00
Travis Hoppe	d4cc18b746	Added author information for NLPre (#5414 ) * Add author links for NLPre and update category * Add contributor statement	2020-05-08 11:28:54 +02:00
Samuel Rodríguez Medina	8602daba85	Swedish like_num (#5371 ) * Sign contributor agreement. * Add like_num functionality to Swedish. * Update spacy/tests/lang/sv/test_lex_attrs.py Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> * Update contributor agreement Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-04-29 21:25:22 +02:00
adrianeboyd	a6e521cd79	Add is_sent_end token property (#5375 ) Reconstruction of the original PR #4697 by @MiniLau. Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema because the Matcher is only going to be able to support `IS_SENT_START`.	2020-04-29 12:53:16 +02:00
Louis Guitton	a27c4014f5	Add mlflow to spaCy universe (#5352 ) * Add mlflow to universe * Use mlflow black logo	2020-04-29 10:18:03 +02:00
Michael	5b5528ff2e	Add `!=3.4.*` to python_requires (#5344 ) Missed in `80d554f2e2`	2020-04-27 22:02:09 +02:00
Punitvara	b2b7e1f37a	This PR adds Gujarati Language class along with (#5355 ) * This PR adds Gujarati Language class along with - stop words * Add test for gu tokenizer	2020-04-27 11:07:37 +02:00
sabiqueqb	fc91660aa2	Gh 5339 language class for malayalam (#5342 ) * Initialize Malayalam Language class * Add lex_attrs and examples for Malayalam * Add spaCy Contributor Agreement * Add test for ml tokenizer	2020-04-27 09:45:08 +02:00
Mike	481574cbc8	[minor doc change] embedding vis. link is broken in `website/docs/usage/examples.md` (#5325 ) * The embedding vis. link is broken The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share? * contributor agreement * Update Mlawrence95.md * Update website/docs/usage/examples.md Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com> Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2020-04-21 20:35:12 +02:00
laszabine	fb73d4943a	Amend documentation to Language.evaluate (#5319 ) * Specified usage of arguments to Language.evaluate * Created contributor agreement	2020-04-16 20:00:18 +02:00
Jakob Jul Elben	663333c3b2	Fixes #5413 (#5315 ) * Fix 5314 * Add contributor * Resolve requested changes Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>	2020-04-16 13:29:02 +02:00
Sébastien Harinck	dac70f29eb	contrib: add contributor agreement for user sebastienharinck (#5316 )	2020-04-16 11:32:09 +02:00
Paolo Arduin	1ca32d8f9c	Matcher support for Span as well as Doc (#5113 ) * Matcher support for Span, as well as Doc #5056 * Removes an import unused * Signed contributors agreement * Code optimization and better test * Add error message for bad Matcher call argument * Fix merging	2020-04-15 13:51:33 +02:00
Thomas Thiebaud	1eef60c658	Add spacy_fastlang to universe (#5271 ) * Add spacy_fastlang to universe * Sign SCA	2020-04-15 13:50:46 +02:00
Paolo Arduin	8ce408d2e1	Comparison predicate handling for `!=` (#5282 ) * Fix #5281 * Optim test	2020-04-14 19:14:15 +02:00
Marek Grzenkowicz	6a8a52650f	[Closes #5292 ] Fix typo in option name "--n-save_every" (#5293 ) * Sign contributor agreement for chopeen * Fix typo in option name and close #5292	2020-04-11 23:35:01 +02:00
Umar Butler	8952effcc4	Fixed Typo in Warning (#5284 ) * Fixed typo in cli warning Fixed a typo in the warning for the provision of exactly two labels, which have not been designated as binary, to textcat. * Create and signed contributor form	2020-04-09 15:46:15 +02:00
Leander Fiedler	b63871ceff	issue5230: added contributors agreement	2020-04-06 21:04:06 +02:00
vincent d warmerdam	f329d5663a	add "whatlies" to spaCy universe (#5252 ) * Add "whatlies" We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :) * sign contributor thing * Added fancy gif as the image * Update universe.json Spellin error and spaCy clarification.	2020-04-06 11:29:30 +02:00
YohannesDatasci	beef184e53	Armenian language support (#5246 ) * add Armenian language and test cases * agreement submission	2020-04-03 13:02:18 +02:00
Michael Leichtfried	2b14997b68	Remove duplicated branch in if/else-if statement (#5234 ) * Remove duplicated branch in if-elif-statement * Add contributor agreement for leicmi	2020-04-02 14:47:42 +02:00
Jacob Lauritzen	0b76212831	Extend and fix Danish examples (#5227 ) * Extend and fix Danish examples This PR fixes two examples, adds additional examples translated from the english version, and adds punctuation. The two changed examples are: * "fortov" changed to "fortovet", which is more [used](https://www.google.com/search?client=firefox-b-d&sxsrf=ALeKk0143gEuPe4IbIUpzBBt-oU10OMVqA%3A1585549036477&ei=7I6BXuvJHMGOrwSqi46oCQ&q=l%C3%B8behjul+p%C3%A5+fortov&oq=l%C3%B8behjul+p%C3%A5+fortov&gs_lcp=CgZwc3ktYWIQAzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQR1DT8xZY0_MWYK_0FmgAcAZ4AIABAIgBAJIBAJgBAKABAaoBB2d3cy13aXo&sclient=psy-ab&ved=0ahUKEwjr7964xsHoAhVBx4sKHaqFA5UQ4dUDCAo&uact=5) and more natural. The Swedish and Norwegian examples also use this version of the word. * "stor by" changed to "storby". In Danish we have a specific noun to describe a large, metropolitan city which is different from just describing a city as "large". In this sentence it would be much more natural to describe London as a "storby". Google even correct as search for "London stor by" to "London storby". * Sign contrib agreement	2020-04-02 10:42:35 +02:00
Nikhil Saldanha	4f27a24f5b	Add kannada examples (#5162 ) * Add example sentences for Kannada * sign contributor agreement	2020-03-29 13:54:42 +02:00
Tom Milligan	e904958115	Limit to cupy-cuda v8, so as not to pull in v9 automatically. (#5194 )	2020-03-29 13:52:08 +02:00
Tiljander	e53232533b	Describing priority rules for overlapping matches (#5197 ) * Describing priority rules for overlapping matches * Create Tiljander.md * Describing priority rules for overlapping matches * Update website/docs/api/entityruler.md Co-Authored-By: Ines Montani <ines@ines.io> Co-authored-by: Ines Montani <ines@ines.io>	2020-03-26 13:13:22 +01:00
Ines Montani	3fc2309c48	Merge pull request #5174 from Baciccin/master Add Ligurian language	2020-03-24 16:33:59 +01:00
Philip Gillißen	128acb9ee1	Update guerda.md	2020-03-24 10:42:30 +01:00
Philip Gillißen	5d067bcc5e	Add SCA for guerda	2020-03-24 10:42:10 +01:00
Baciccin	3b53617a69	Add Ligurian language	2020-03-19 21:37:01 -07:00
Ines Montani	17bd9ed84f	Merge pull request #5153 from pinealan/fix/website-docs Fix website typos and weird sentences	2020-03-16 15:03:01 +01:00
Alan Chan	1ae01684cf	Fill in contributor agreement	2020-03-15 03:45:20 +08:00
nihil	9cde7eb08c	add spacy_syllables to universe + sign contributor agreement	2020-03-13 18:09:42 +01:00
Himanshu Garg	27d1300bdb	Create merrcury.md	2020-03-10 15:11:07 +05:30
Mark Abraham	0345135167	Tokenizer to_disk and from_disk now ensure paths (#5116 ) * Tokenizer to_disk and from_disk now ensure strings are converted to paths Fixes #5115 * Sign contributor agreement	2020-03-08 13:25:56 +01:00
David Pollack	80004930ed	fix typo in svg file	2020-03-05 17:04:33 +01:00
Tom Keefe	ddf63b97a8	make idx available via to_array (#5030 )	2020-02-22 14:13:06 +01:00
Jan Jessewitsch	c7e4fe9c5c	Fix/Improve german stop words (#5024 ) * Fix german stop words Two stop words ("einige" and "einigen") are sticking together. Remove three nouns that may serve as stop words in a specific context (e.g. religious or news) but are not applicable for general use. * Create Jan-711.md	2020-02-17 18:59:22 +01:00
Filip Bednárik	d4f4060bf3	Add Slovak language tools implementation (#4943 ) * Add correct stopwords for Slovak language * Add SNK Tags * Disable formatting lint for TAGS * Add example sentences for Slovak language * Add slovak numerals in base form * Add lex_attrs to sk init * Add contributor agreement	2020-02-03 13:03:59 +01:00
Tyler Couto	9fa9d7f2cb	Fix for Issue 4665 - conllu2json (#4953 ) * Fix for Issue 4665 - conllu2json - Allowing HEAD to be an underscore * Added contributor agreement	2020-02-03 13:01:48 +01:00
Paco Nathan	49fefb6139	Submitting `PyTextRank` for inclusion in the spaCy uniVerse (#4942 ) * submitting PyTextRank for consideration of including in the spaCy uniVerse * including SCA	2020-01-28 11:37:54 +01:00
Anastasiia Iurshina	1830a12578	Fixes typos (#4843 ) * Fixes typos * Fixes typo * Contributor agreement	2019-12-29 14:24:13 +01:00
Ivan Echevarria	ef13e0c038	Add n_process to Language.pipe documentation (#4842 ) [ci skip] * Add n_process to documentation * Auto-format and add default [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2019-12-29 14:23:33 +01:00
Al Johri	fd4a7bd2b7	sign contributor agreement for AlJohri (#4839 ) [ci skip]	2019-12-29 14:17:28 +01:00
Olamilekan Wahab	a741de7cf6	Adding support for Yoruba Language (#4614 ) * Adding Support for Yoruba * test text * Updated test string. * Fixing encoding declaration. * Adding encoding to stop_words.py * Added contributor agreement and removed iranlowo. * Added removed test files and removed iranlowo to keep project bare. * Returned CONTRIBUTING.md to default state. * Added delted conftest entries * Tidy up and auto-format * Revert CONTRIBUTING.md Co-authored-by: Ines Montani <ines@ines.io>	2019-12-21 14:11:50 +01:00
Nicolai Bjerre Pedersen	de5453cdcb	Fix link to user hooks in docs (#4778 ) * Fix link to user hooks in docs * Update mr_bjerre.md Mistake in contributor agreement * Apparently hard to get it right (wrong name of sca)	2019-12-06 19:17:12 +01:00
Antti Ajanki	e626a011cc	Improvements to the Finnish language data (#4738 ) * Enable lex_attrs on Finnish * Copy the Danish tokenizer rules to Finnish Specifically, don't break hyphenated compound words * Contributor agreement * A new file for Finnish tokenizer rules instead of including the Danish ones	2019-12-03 12:55:28 +01:00
Matt Maybeno	c9f1e99787	Agnostic vocab array fix (#4680 ) * Use get_array_module instead of numpy * add contributor agreement	2019-11-23 14:59:52 +01:00
GuiGel	8f7ab70870	Bugfix/fix entity ruler from disk (#4670 ) * fix EntityRuler from_disk bug * add contributor file * Test EntityRuler PhraseMatcher deserialization (#4651) * newline at end of file * fix copy paste error * serializing the EntityRuler by itself * Add unicode declarations for Python 2 and auto-format	2019-11-21 16:26:37 +01:00
Elijah Rippeth	5ad5c4b44a	Add initial Korean support (#4660 ) * add hangul and jamo char classes. * add initial Korean lexical attributes. * add contributor agreement	2019-11-18 12:56:07 +01:00
Christoph Purschke	433748e867	Fix basic language support for Luxembourgish (by adding punctuation.py) (#4648 ) * Update __init__.py * Create punctuation.py * Update tokenizer_exceptions.py * Create questoph.md * Update questoph.md * Update test_text.py * Update test_text.py * Update test_text.py * Update test_text.py	2019-11-15 16:16:47 +01:00
Priscilla de Abreu Lopes	39e79fcc86	Bugfix/dep matcher issue 4590 (#4601 ) * add contributor agreement for prilopes * add test for issue #4590 * fix on_match params for DependencyMacther (#4590)	2019-11-07 12:01:06 +01:00
Neel Kamath	6c036ab57d	Add "spaCy Server" to spaCy Universe (#4553 ) * Add "spaCy Server" to spaCy Universe * Accept the spaCy Contributor Agreement	2019-10-30 13:20:46 +01:00
Ines Montani	1185702993	Port over contributor agreement from spacy-lookups-data [ci skip]	2019-10-25 13:06:10 +02:00
Zhuoru Lin	10d88b09bb	Bugfix/fix wikidata train entity linker (#4509 ) * Fix labels_discard Nonetype iteration error * Contributor agreement for Zhuoru Lin * Enhance EntityLinker.predict() to handle labels_discard is None case.	2019-10-24 12:52:59 +02:00
gustavengstrom	050e2445a8	Adding noun_chunks to the Swedish language model (sv) (#4422 ) * Create syntax_iterators.py Replica of spacy/lang/fr/syntax_iterators.py * Added import statements for SYNTAX_ITERATORS * Create gustavengstrom.md * Added "dobj" to list of labels in noun_chunks method and a test_noun_chunks method to the Swedish language model. * Delete README-checkpoint.md Co-authored-by: Gustav <gustav@davcon.se> Co-authored-by: Ines Montani <ines@ines.io>	2019-10-21 12:57:06 +02:00
Pepe Berba	7772d5d3c5	Update `vocab.get_vector` docs to include features on Fasttext ngram (#4464 ) * Update `vocab.get_vector` * Added contrib agreement	2019-10-20 01:28:18 +02:00
Peter Gilles	428887b8f2	Initial commit: New language Luxembourgish (lb) (#4424 ) * new language: Luxembourgish (lb) * update * update * Update and rename .github/CONTRIBUTOR_AGREEMENT.md to .github/contributors/PeterGilles.md * Update and rename .github/contributors/PeterGilles.md to .github/CONTRIBUTOR_AGREEMENT.md * Update norm_exceptions.py * Delete README.md * moved test_lemma.py * deactivated 'lemma_lookup = LOOKUP' * update * Update conftest.py * update * tests updated * import unicode_literals * Update spacy/tests/lang/lb/test_text.py Co-Authored-By: Ines Montani <ines@ines.io> * Create PeterGilles.md	2019-10-14 12:27:50 +02:00
Ben Taylor	1db79a33cb	most_similar() return the k most similar vectors (#4364 ) * most_similar return n-most similar vectors * updated most_similar comment * add bintay contributor agreement * sign bintay contributor agreement * fix most_similar documentation typo * fixed error in prune_vectors * updated prune_vectors test	2019-10-03 14:09:44 +02:00
Rahul Soni	ed620daa5c	Fix example sentences in Hindi for grammatical errors (#4343 ) * Fix grammar for hindi * Fix grammar for hindi * Submit contributor agreement	2019-09-30 23:32:49 +02:00
Ines Montani	159b72ed4c	Delete main.yml	2019-09-29 15:58:59 +02:00
Ines Montani	539a7b53cd	Update main.yml	2019-09-29 15:55:26 +02:00
Ines Montani	b7913c8eca	Update main.yml	2019-09-29 15:40:07 +02:00
Ines Montani	eb2b60069e	Update main.yml	2019-09-29 15:33:53 +02:00
Ines Montani	70295f9e59	Update main.yml	2019-09-29 15:32:11 +02:00
Ines Montani	b503270b09	Update main.yml	2019-09-29 15:30:31 +02:00
Ines Montani	52ea244830	Fix workflows	2019-09-29 15:30:13 +02:00
Ines Montani	e9acfaec52	Revert "Revert "Rename workflows to _workflows"" This reverts commit `051fac51ee`.	2019-09-29 15:29:02 +02:00
Ines Montani	051fac51ee	Revert "Rename workflows to _workflows" This reverts commit `ba0027c936`.	2019-09-29 15:28:59 +02:00
Ines Montani	7164c687e9	Revert "Merge branch 'master' of https://github.com/explosion/spaCy " This reverts commit `41aab59dbf`, reversing changes made to `ba0027c936`.	2019-09-29 15:28:31 +02:00
Ines Montani	41aab59dbf	Merge branch 'master' of https://github.com/explosion/spaCy	2019-09-29 15:26:32 +02:00
Ines Montani	ba0027c936	Rename workflows to _workflows	2019-09-29 15:26:23 +02:00
Ines Montani	80f67f6065	Update build.yml	2019-09-29 15:24:28 +02:00
Ines Montani	e787e6d47f	Update build.yml	2019-09-29 15:15:34 +02:00
Ines Montani	b2f41e2a9b	Update build.yml	2019-09-29 15:06:19 +02:00
Ines Montani	8b02fff097	Update build.yml	2019-09-29 14:55:43 +02:00
Ines Montani	ace0d5c580	Update build.yml	2019-09-29 14:52:01 +02:00
Ines Montani	d32fb03401	Update build.yml	2019-09-29 14:48:21 +02:00
Ines Montani	a5c0130b50	Update and rename pythonpackage.yml to build.yml	2019-09-29 14:43:48 +02:00
EarlGreyT	1e9e2d8aa1	fix typo in first token (#4327 ) * fix typo in first token The head of 'in' is review which has an offset of 4 and not 44 * added contributor agreement	2019-09-27 14:49:36 +02:00
Jaydeep Borkar	6a06a3fa6a	Update stop_words.py and add name in contributors (#4325 ) * Update stop_words.py and add name in contributors * add jaydeepborkar.md in contributors directory * Reset template [ci skip] Co-authored-by: Ines Montani <ines@ines.io>	2019-09-27 11:57:27 +02:00
Em Zhan	aafa091541	Fix typo in documentation (#4322 ) * Fix typo 'probj' instead of 'pobj' * Add spaCy contributor agreement for zqianem	2019-09-25 19:42:18 +02:00
Sean Löfgren	31c683d87d	add return_matches and as_tuples back to Matcher.pipe (#4303 ) * add contributor agreement [ci skip] * add return_matches and as_tuples back to Matcher.pipe	2019-09-18 22:00:33 +02:00
Moshe Hazoom	72463b062f	Improve speed of _merge method (#4300 ) * make merge more efficient * fix offsets * merge works with relative indices * remove printing * Add the SCA * fix SCA date * more cythonize _retokenize.pyx * more cythonize _retokenize.pyx * fix only declaration in _retokenize.pyx * switch back to absolute head * switch back to absolute head * fix comment * merge from origin repo	2019-09-18 21:34:34 +02:00
tamuhey	71909cdf22	Fix iss4278 (#4279 ) * fix: len(tuple) == 2 * (#4278) add fail test * add contributor's aggreement	2019-09-12 10:44:49 +02:00
Mihai Gliga	25aecd504f	adding Romanian tag_map (#4257 ) * adding Romanian tag_map * added SCA file * forgotten import	2019-09-09 11:53:09 +02:00
Ines Montani	bcd1b12f43	Add contributor agreement [ci skip]	2019-08-30 17:02:43 +02:00
Andrei-Marius Avram	199589228e	Added RONEC to spaCy Universe (#4151 ) * Added RONEC to spaCy Universe * Added contributor file * Corrected date from .github/contributors/avramandrei.md * Convert tabs to spaces * Remove duplicate keys Can only have one GitHub link unfortunately * Also add models category * Adjust ID This is used to generate the URL, so a simpler string is better	2019-08-20 14:46:07 +02:00
Ivan Šarić	434f6fa6c1	Issue #1107 - adds examples.py for Croatian language (#4143 ) * adds contributor agreement for isaric * adds examples.py for croatian language	2019-08-18 23:04:41 +02:00
yanaiela	ec0beccaf1	Custom entity render (#4117 ) * customizable template for entities display, allowing to pass additional parameters along each entity * contributor agreement * simpler naming for the additional parameters given to the span entities renderer Co-Authored-By: Ines Montani <ines@ines.io> * change of default parameter, as suggested Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-16 18:39:25 +02:00
Ziming He	eea7d4f4a8	biluo_tags_from_offsets throw exception for overlapping entities (#4021 ) * Check whether two entities overlap - biluo_gold_biluo_overlap now throw exception when entities passed in have overlaps - added unit test * SCA agreement	2019-08-15 18:13:32 +02:00
AJ Rader	2f3648700c	Correction of default lemmatizer lookup in English (Issue # 4104) (#4110 ) * pytest file for issue4104 established * edited default lookup english lemmatizer for spun; fixes issue 4102 * eliminated parameterization and sorted dictionary dependnency in issue 4104 test * added contributor agreement	2019-08-15 11:39:10 +02:00
Ines Montani	5196dbd89d	Delete wip.yml [ci skip]	2019-08-13 13:31:21 +02:00
Ines Montani	35c865024b	Fix file name [ci skip]	2019-08-12 18:39:54 +02:00
Ines Montani	3a39154804	Create wip.yaml [ci skip]	2019-08-12 17:26:31 +02:00
黎谢鹏	250a54414b	update lang/zh (#4103 ) * update lang/zh * update lang/zh	2019-08-12 10:37:48 +02:00
ICLR&D	87e40b17a0	Add entry for Blackstone in universe.json (#4101 ) * Add entry for Blackstone in universe.json Add an entry for the Blackstone project. Checked JSON is valid. * Create ICLRandD.md * Fix indentation (tabs to spaces) It looks like during validation, the JSON file automatically changed spaces to tabs. This caused the diff to show everything as changed, which is obviously not true. This hopefully fixes that. * Try to fix formatting for diff * Fix diff Co-authored-by: Ines Montani <ines@ines.io>	2019-08-09 17:16:51 +02:00
Jeno	15be09ceb0	Raise error if annotation dict in simple training style has unexpected keys #4074 (#4079 ) * adding enhancement #4074. * modified behavior to strictly require top level dictionary keys - issue #4074 * pass expected keys to error message and add links as expected top level key	2019-08-06 11:01:25 +02:00
Pavle Vidanović	e1a935d71c	Stopwords for Serbian language. (#4078 ) * Serbian stopwords added. (cyrillic alphabet) * spaCy Contribution agreement included. * Test initialize updated	2019-08-05 10:22:27 +02:00
veer-bains	874bd8c8dd	Fixed syntax error in lang/ko when using python 2 (#4082 ) (closes #4068 ) * fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py * fixed syntax error in declaring variables with python 2.7 in spacy/lang/ko/__init__.py * Update __init__.py * Create veer-bains.md * Update __init__.py fixed syntax errors in variable datatype assignment when calling spacy.blank("ko") with python 2.7	2019-08-05 10:19:32 +02:00
Anastassia	33b14724a5	Update gold corpus code to properly ingest a directory of jsonl… (#4067 ) * Update gold corpus code to properly ingest a directory of jsonlines files In response to: https://github.com/explosion/spaCy/issues/3975 * Update spacy/gold.pyx Co-Authored-By: Ines Montani <ines@ines.io>	2019-08-02 09:58:51 +02:00
Mohammed Daudali	23ec07debd	Correct typo for AllenAI url on homepage (#4050 ) * Typo fix for AllenAI url Changed incorrect home page url for AllenAI from appenai.org to allenai.org * Sign contributor agreement * Change date format	2019-07-31 00:16:33 +02:00
Bae Yong-Ju	05fbf5d976	Fix error when Korean text contains regexp special characters. (#4022 )	2019-07-25 17:53:33 +02:00
Falak Asad	ff1e73e35c	Bugfix/issue 3968 (#3982 ) * Fix for issue-3968 * Added contributor agreement * Made suggested changes	2019-07-18 00:20:32 +02:00
pmbaumgartner	931e87f927	contributor agreement	2019-07-14 20:46:06 -04:00
yash	d5311b3c42	Add test file for issue (#3625 ) and spacy contributor agreement	2019-07-11 14:53:14 +05:30
cedar101	58f06e6180	Korean support (#3901 ) * start lang/ko * add test codes * using natto-py * add test_ko_tokenizer_full_tags() * spaCy contributor agreement * external dependency for ko * collections.namedtuple for python version < 3.5 * case fix * tuple unpacking * add jongseong(final consonant) * apply mecab option * Remove Pipfile for now Co-authored-by: Ines Montani <ines@ines.io>	2019-07-09 22:23:16 +02:00
Alex	a795fbd3b2	added contributor agreement ameyuuno.md (#3925 ) @ines hi! I asked to change my username (yuukos -> ameyuuno). So I added a new contributor agreement.	2019-07-09 10:09:52 +02:00
Joshua Smith	e8420ab2b7	Added support for serializing overwrite and ent_id_sep (#3918 ) * Perserve flags in EntityRuler The EntityRuler (explosion/spaCy#3526) does not preserve overwrite flags (or `ent_id_sep`) when serialized. This commit adds support for serialization/deserialization preserving overwrite and ent_id_sep flags. * add signed contributor agreement * flake8 cleanup mostly blank line issues. * mark test from the issue as needing a model The test from the issue needs some language model for serialization but the test wasn't originally marked correctly. * remove unneeded model loading The model didn't need to be loaded, and I replaced it with a change that doesn't require it (using existings fixtures) * change tempdir handling to be compatible with python 2.7 * Adds code to handle item saved before this change. This code chanes how the save files are handled and how the bytes are stored as well. This code adds check to dispatch correctly if it encounters bytes or files saved in the old format (and tests for those cases). * use util function for tempdir management Updated after PR comments: this code now uses the make_tempdir function from util instead of doing it by hand.	2019-07-08 17:28:28 +02:00
Knut O. Hellan	a54f0cfc2b	Norwegian tweaks (#3894 ) * Norwegian fix Add support for alternative past tense verb form (vaska). * Norwegian months Add all Norwegian months to tokenizer excpetions. * More Norwegian abbreviations Add more Norwegian abbreviations to tokenizer_exceptions. * Contributor agreement khellan Add signed contributor agreement for khellan (Knut O. Hellan).	2019-07-08 10:28:47 +02:00
Patrick Hogan	8c0586fd9c	Update example and sign contributor agreement (#3916 ) * Sign contributor agreement for askhogan * Remove unneeded `seen_tokens` which is never used within the scope	2019-07-08 10:27:20 +02:00
Rokas Ramanauskas	61ce126d4c	Lithuanian language support (#3895 ) * initial LT lang support * Added more stopwords. Started setting up some basic test environment (not complete) * Initial morph rules for LT lang * Closes #1 Adds tokenizer exceptions for Lithuanian * Closes #5 Punctuation rules. Closes #6 Lexical Attributes * test: add native examples to basic tests * feat: add tag map for lt lang * fix: remove undefined tag attribute 'Definite' * feat: add lemmatizer for lt lang * refactor: add new instances to lt lang morph rules; use tags from tag map * refactor: add morph rules to lt lang defaults * refactor: only keep nouns, verbs, adverbs and adjectives in lt lang lemmatizer lookup * refactor: add capitalized words to lt lang lemmatizer * refactor: add more num words to lt lang lex attrs * refactor: update lt lang stop word set * refactor: add new instances to lt lang tokenizer exceptions * refactor: remove comments form lt lang init file * refactor: use function instead of lambda in lt lex lang getter * refactor: remove conversion to dict in lt init when dict is already provided * chore: rename lt 'test_basic' to 'test_text' * feat: add more lt text tests * feat: add lemmatizer tests * refactor: remove unused imports, add newline to end of file * chore: add contributor agreement * chore: change 'en' to 'lt' in lt example description * fix: add missing encoding info * style: add newline to end of file * refactor: use python2 compatible syntax * style: reformat code using black	2019-07-08 10:25:22 +02:00
Guillaume Claret	d7a519a922	Typo (#3865 ) * Typo * Add contributor agreement	2019-06-20 10:31:19 +02:00
Alejandro Alcalde	4866a7ee9e	Changed learning rate by its param name. (#3855 ) * Changed learning rate by its param name. I've been searching for a while how the parameter learning rate was named, with `beta1` and `beta2` its easy as they are marked as code, but learning rate wasn't. I think writing the actual parameter name would be helpful. * Signing SCA	2019-06-20 10:29:20 +02:00
Greg Werner	9041a72d7f	Update tokenizer.md for construction example (#3790 ) * Update tokenizer.md for construction example Self contained example. You should really say what nlp is so that the example will work as is * Update CONTRIBUTOR_AGREEMENT.md * Restore contributor agreement * Adjust construction examples	2019-06-16 14:32:56 +02:00

1 2 3 4 5 ...

498 Commits