spaCy

mirror of https://github.com/explosion/spaCy.git synced 2024-12-27 10:26:35 +03:00

Author	SHA1	Message	Date
Hunter Kelly	f28a1c7271	Update call to `mkdir()` to create the parents (#3139 ) * Update call to `mkdir()` to create the parents - Update the call to `output_dir.mkdir()` to also create the parents if needed * don't automatically create parents but fail fast if cannot create directory * add signed contributors agreement for retnuh	2019-01-11 03:02:18 +01:00
Amandine Périnet	ee24e2534d	French lemmatization: adding lemmas for adverbs and irregular lemmas for function words (#3131 ) * adding adverbs and irregular cases for empty words * adding adverbs and irregular cases for empty words * adding adverbs and irregular cases for empty words * updating contributor agreement for amperinet	2019-01-10 15:41:15 +01:00
Mathieu Morey	f07b577fbd	Support CUDA 10 (#3126 ) * ENH support CUDA 10 * Update _instructions.jade	2019-01-09 03:10:45 +01:00
Amandine Périnet	eef11a7a2c	French lemmatization: correcting wrong lemmas in the lookup dictionnary (#3104 ) * modifying French lookup that contained wrong lemmas * correcting wrong line breaks on hyphen * adding contributor agreement for amperinet@ * correcting a typo	2019-01-07 14:15:19 +01:00
alvations	9972716e01	Create alvations.md (#3119 )	2019-01-05 13:11:06 +01:00
Álvaro Abella Bascarán	9bc4cc1352	Fix issue 2396 (#3089 ) * Test on #2396: bug in Doc.get_lca_matrix() * reimplementation of Doc.get_lca_matrix(), (closes #2396) * reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix() * tests Span.get_lca_matrix() as well as Doc.get_lca_matrix() * implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix * use memory view instead of np.ndarray in _get_lca_matrix (faster) * fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview * cleaner conditional, add comment	2018-12-29 18:05:52 +01:00
Álvaro Abella Bascarán	6fe276f85d	Fix issue 2396 (#3089 ) * Test on #2396: bug in Doc.get_lca_matrix() * reimplementation of Doc.get_lca_matrix(), (closes #2396) * reimplement Span.get_lca_matrix(), and call it from Doc.get_lca_matrix() * tests Span.get_lca_matrix() as well as Doc.get_lca_matrix() * implement _get_lca_matrix as a helper function in doc.pyx; call it from Doc.get_lca_matrix and Span.get_lca_matrix * use memory view instead of np.ndarray in _get_lca_matrix (faster) * fix bug when calling Span.get_lca_matrix; return lca matrix as np.array instead of memoryview * cleaner conditional, add comment	2018-12-29 18:02:26 +01:00
Jari Bakken	e172f2478e	Add three missing tags from the `nb` tag map (#3085 ) * Contributors agreement for jarib * Add tags from the UD/NORNE dataset that is missing in the nb tag map. Relates to #3082.	2018-12-27 14:48:40 +01:00
Will Price	4a6af0852a	Improve random prefix generation in displaCy arcs (#3096 ) * Improve random prefix generation in displaCy arcs * Add @willprice contributor agreement	2018-12-27 14:46:02 +01:00
Özcan Kasal	b573ebca77	trilyon forgotten (#3083 ) * trilyon forgotten * contributor added	2018-12-27 14:44:23 +01:00
Ken	5f0c5fbfa4	issue #3012 : add test (#3021 ) * issue #3012: add test * add contributor aggreement * Make test work without models and fix typos ten.pos_ instead of ten.orth_ and comparison against "10" instead of integer 10	2018-12-18 15:02:49 +01:00
Kirill Bulygin	2fb004832f	Fix the first `nlp` call for `ja` (closes #2901 ) (#3065 ) * Fix the first `nlp` call for `ja` (closes #2901) * Add unicode declaration, formatting and use relative import	2018-12-18 15:01:06 +01:00
Kirill Bulygin	10189d9092	Fix the first `nlp` call for `ja` (closes #2901 ) (#3065 ) * Fix the first `nlp` call for `ja` (closes #2901) * Add unicode declaration, formatting and use relative import	2018-12-18 14:53:50 +01:00
Brixjohn	52f3c95004	Added alpha support for Tagalog language (#3062 ) I have added alpha support for the Tagalog language from the Philippines. It is the basis for the country's national language Filipino. I have heavily based the format to the EN and ES languages. I have provided several words in the lemmatizer lookup table, added stop words from a source, translated numeric words to its Tagalog counterpart, added some tokenizer exceptions, and kept the tag map the same as the English language. While the alpha language passed the preliminary testing that you provided, I think it needs more data to be useful for most cases. * Added alpha support for Tagalog language * Edited contributor template * Included SCA; Reverted templates * Fixed SCA template * Fixed changes in SCA template	2018-12-18 13:08:38 +01:00
Amandine Périnet	361554f629	Lemmatization of Adjectives - French : adding rules and vocabulary (#3045 ) * modifying FR lemmatisation for Adjectives * adding contributor agreement for amperinet * correcting some errors in vocabulary files	2018-12-16 18:11:07 +01:00
Aki Ariga	7fcd6419ff	Upadate the document for Unidic link with latest version URL (#3022 ) * Upadate Unidic link for latest version in document This patch improves #3017 . The link for Unidic was old version one, so will the lates version. * Add contributor agreement * Use more specific link for unidic-cwj	2018-12-07 17:24:48 +01:00
Amandine Périnet	2457318b7a	Lemmatization of Verbs - French : adding rules and vocabulary (#3006 ) * updating rules and vocabulary for French lemmatization of verbs * updating the file with French auxiliary verb * updating rules and vocabulary for French lemmatization of verbs * adding contributor agreement for amperinet * adding rules for words with inclusive parentheses wrongly tokenized	2018-12-06 15:49:28 +01:00
Beate Sildnes	f0d7e206ec	Updated wordforms for Norwegian lemmatizer (#3007 ) * Updated wordforms for Norwegian lemmatizer Upload of updated lists of wordforms for the Norwegian lemmatizer (nouns, verbs, adverbs, adjectives and lookup). * Add spaCy contributor agreement for user beatesi * Updated wordforms for Norwegian lemmatizer	2018-12-06 15:46:18 +01:00
Gavriel Loria	ae5601beae	Initialize trues to 0.0 in training example (#3004 ) * added contributor agreement * if there are no true positives, precision should be 0.0	2018-12-03 01:33:22 +01:00
wxv	06820ef6e7	Fix is_ascii documentation and create contributor file (#2988 ) Proposed in #2933	2018-11-30 15:57:58 +01:00
Sofie	585de273cd	Fix small typo bug in French regexp + relevant unit test (#2980 ) * additional unit test for new entr word not in other lists * bugfix - unit test works * use _latin_lower instead of alpha_lower for french * revert back to ALPHA_LOWER (following the code for languages) * contributor agreement	2018-11-29 20:16:13 +01:00
Adam Schwalm	00566949de	Fix bug where Vocab.prune_vector did not use 'batch_size' (#2977 ) Fixes #2976	2018-11-28 19:49:33 +01:00
Marc Puig	98fe1ab259	Catalan Language Support (#2940 ) * Catalan language Support * Ddding Catalan to documentation	2018-11-26 15:25:47 +01:00
Shawn Cicoria	7601ae0cff	fixes symbolic link on py3 and windows (#2949 ) * fixes symbolic link on py3 and windows during setup of spacy using command python -m spacy link en_core_web_sm en closes #2948 * Update spacy/compat.py Co-Authored-By: cicorias <cicorias@users.noreply.github.com>	2018-11-24 15:34:23 +01:00
Francisco Aranda	be99f1cac5	Include universe spec for spacy-wordnet component (#2919 ) * feat: include universe spec for spacy-wordnet component * chore: include spaCy contributor agreement	2018-11-13 23:54:46 +01:00
mikelibg	75e7d503b7	Removed space in docs + added contributor indo (#2909 ) * - removed unneeded space in documentation * - added contributor info	2018-11-08 14:18:25 +01:00
Bram Vanroy	071789467e	Documentation improvement regarding joblib and SO (#2867 ) Some documentation improvements ## Description 1. Fixed the dead URL to joblib 2. Fixed Stack Overflow brand name (with space) ### Types of change Documentation ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-10-24 15:19:17 +02:00
JKhakpour	74a30d883c	Add Persian(Farsi) language support (#2797 )	2018-10-13 15:31:49 +02:00
Marina Lysyuk	b76fe08308	Correcting lang/ru/examples.py (#2845 ) * Correct some grammatical inaccuracies in lang\ru\examples.py; filled Contributor Agreement * Correct some grammatical inaccuracies in lang\ru\examples.py * Move contributor agreement to separate file	2018-10-13 15:19:43 +02:00
Jacopo Farina	42c42376a3	Visual C++ link updated (#2842 ) (closes #2841 ) [ci skip] * New landing page * Add contribution agreement	2018-10-12 14:59:45 +02:00
Przemysław Hojnacki	966b583d5e	agreement of contributor, may I introduce a tiny pl languge contribution (#2799 ) * Contributors agreement * Contributors agreement * Contributors agreement	2018-09-27 12:25:22 +02:00
Charles-Axel Dein	94ad3c55f1	Add charlax's contributor agreement (#2805 )	2018-09-27 12:24:42 +02:00
darindf	8227566805	Fix error (#2802 ) * Fix error ValueError: cannot resize an array that references or is referenced by another array in this way. Use the resize function * added spaCy Contributor Agreement	2018-09-26 21:31:03 +02:00
Keshan	9a016d17c2	Adding basic support for Sinhala language. (#2788 ) * adding Sinhala language package, stop words, examples and lex_attrs. * Adding contributor agreement * Updating contributor agreement	2018-09-25 12:18:25 +02:00
John Stewart	2d15859d2a	Fixed spaCy+Keras example (#2763 ) * bug fixes in keras example * created contributor agreement	2018-09-15 13:06:39 +02:00
Andrew Ongko	81564cc4e8	Update Indonesian model (#2752 ) * adding e-KTP in tokenizer exceptions list * add exception token * removing lines with containing space as it won't matter since we use .split() method in the end, added new tokens in exception * add tokenizer exceptions list * combining base_norms with norm_exceptions * adding norm_exception * fix double key in lemmatizer * remove unused import on punctuation.py * reformat stop_words to reduce number of lines, improve readibility * updating tokenizer exception * implement is_currency for lang/id * adding orth_first_upper in tokenizer_exceptions * update the norm_exception list * remove bunch of abbreviations * adding contributors file	2018-09-14 12:30:32 +02:00
Filipe Caixeta	fe515085f3	Add words to portuguese language _num_words (#2759 ) * Add words to portuguese language _num_words * Add words to portuguese language _num_words	2018-09-14 12:30:16 +02:00
Grivaz	aeba99ab0d	Introduces a bulk merge function, in order to solve issue #653 (#2696 ) * Fix comment * Introduce bulk merge to increase performance on many span merges * Sign contributor agreement * Implement pull request suggestions	2018-09-10 16:41:42 +02:00
tyburam	476472d181	Lex _attrs for polish language (#2750 ) * Signed spaCy contributor agreement * Added polish version of english lex_attrs	2018-09-10 11:53:57 +02:00
Sainath Adapa	77139bc03c	Basic support for Telugu language (#2751 )	2018-09-10 11:53:18 +02:00
Maxim Kupfer	97e2874225	added contributor agreement for mbkupfer (#2738 )	2018-09-10 11:32:03 +02:00
Piotr Żelasko	bdb2165bd1	Less norm computations in token similarity (#2730 ) * Less norm computations in token similarity * Contributor agreement	2018-09-05 21:50:23 +02:00
Aniruddha Adhikary	4530ddcc51	update bengali token rules for hyphen and digits (#2731 )	2018-09-05 21:49:00 +02:00
Nathaniel J. Smith	26849874ad	When calling getoption() in conftest.py, pass a default option (#2709 ) * When calling getoption() in conftest.py, pass a default option This is necessary to allow testing an installed spacy by running: pytest --pyargs spacy * Add contributor agreement	2018-09-03 09:57:52 +02:00
Arya Prabhudesai	db2c2b286c	Create aryaprabhudesai.md (#2681 )	2018-08-20 18:56:14 +02:00
Wojciech Łukasiewicz	3953e967a0	User correct variable name in the examples (#2664 ) * correct naming * add contributor agreement	2018-08-13 22:21:24 +02:00
Aashish Gangwani	6eebfc7bf4	Added numbers to ../lang/hi/lex_attrs.py (#2629 ) I have added numbers in hindi lex_attrs.py file according to Indian numbering system(https://en.wikipedia.org/wiki/Indian_numbering_system) and here are there english translations: 'शून्य' => zero 'एक' => one 'दो' => two 'तीन' => three 'चार' => four 'पांच' => five 'छह' => six 'सात'=>seven 'आठ' => eight 'नौ' => nine 'दस' => ten 'ग्यारह' => eleven 'बारह' => twelve 'तेरह' => thirteen 'चौदह' => fourteen 'पंद्रह' => fifteen 'सोलह'=> sixteen 'सत्रह' => seventeen 'अठारह' => eighteen 'उन्नीस' => nineteen 'बीस' => twenty 'तीस' => thirty 'चालीस' => forty 'पचास' => fifty 'साठ' => sixty 'सत्तर' => seventy 'अस्सी' => eighty 'नब्बे' => ninety 'सौ' => hundred 'हज़ार' => thousand 'लाख' => hundred thousand 'करोड़' => ten million 'अरब' => billion 'खरब' => hundred billion <!--- Provide a general summary of your changes in the title. --> ## Description <!--- Use this section to describe your changes. If your changes required testing, include information about the testing environment and the tests you ran. If your test fixes a bug reported in an issue, don't forget to include the issue number. If your PR is still a work in progress, that's totally fine – just include a note to let us know. --> ### Types of change <!-- What type of change does your PR cover? Is it a bug fix, an enhancement or new feature, or a change to the documentation? --> ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-08-08 16:06:11 +02:00
Emil Stenström	3834f4146d	Add abbreviations from UD_Swedish-Talbanken (#2613 ) * Add abbreviations from UD_Swedish-Talbanken * Add contributor agreement.	2018-08-07 13:53:17 +02:00
Sami	dbc993f5b3	Updating description and code snippet spacy-lefff (#2623 ) * updating description and code snippet spacy-lefff * contributors agreement	2018-08-02 17:25:27 +02:00
Vikas Kumar Yadav	23876dbc70	Create vikaskyadav.md (#2621 )	2018-08-02 14:03:44 +02:00
Dmitry Bruhanov	4ad7de6ca9	DimaBryuhanov.md (#2590 ) # spaCy contributor agreement This spaCy Contributor Agreement ("SCA") is based on the [Oracle Contributor Agreement](http://www.oracle.com/technetwork/oca-405177.pdf). The SCA applies to any contribution that you make to any product or project managed by us (the "project"), and sets out the intellectual property rights you grant to us in the contributed materials. The term "us" shall mean [ExplosionAI UG (haftungsbeschränkt)](https://explosion.ai/legal). The term "you" shall mean the person or entity identified below. If you agree to be bound by these terms, fill in the information requested below and include the filled-in version with your first pull request, under the folder [`.github/contributors/`](/.github/contributors/). The name of the file should be your GitHub username, with the extension `.md`. For example, the user example_user would create the file `.github/contributors/example_user.md`. Read this agreement carefully before signing. These terms and conditions constitute a binding legal agreement. ## Contributor Agreement 1. The term "contribution" or "contributed materials" means any source code, object code, patch, tool, sample, graphic, specification, manual, documentation, or any other material posted or submitted by you to the project. 2. With respect to any worldwide copyrights, or copyright applications and registrations, in your contribution: * you hereby assign to us joint ownership, and to the extent that such assignment is or becomes invalid, ineffective or unenforceable, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free, unrestricted license to exercise all rights under those copyrights. This includes, at our option, the right to sublicense these same rights to third parties through multiple levels of sublicensees or other licensing arrangements; * you agree that each of us can do all things in relation to your contribution as if each of us were the sole owners, and if one of us makes a derivative work of your contribution, the one who makes the derivative work (or has it made will be the sole owner of that derivative work; * you agree that you will not assert any moral rights in your contribution against us, our licensees or transferees; * you agree that we may register a copyright in your contribution and exercise all ownership rights associated with it; and * you agree that neither of us has any duty to consult with, obtain the consent of, pay or render an accounting to the other for any use or distribution of your contribution. 3. With respect to any patents you own, or that you can license without payment to any third party, you hereby grant to us a perpetual, irrevocable, non-exclusive, worldwide, no-charge, royalty-free license to: * make, have made, use, sell, offer to sell, import, and otherwise transfer your contribution in whole or in part, alone or in combination with or included in any product, work or materials arising out of the project to which your contribution was submitted, and * at our option, to sublicense these same rights to third parties through multiple levels of sublicensees or other licensing arrangements. 4. Except as set out above, you keep all right, title, and interest in your contribution. The rights that you grant to us under these terms are effective on the date you first submitted a contribution to us, even if your submission took place before the date you sign these terms. 5. You covenant, represent, warrant and agree that: * Each contribution that you submit is and shall be an original work of authorship and you can legally grant the rights set out in this SCA; * to the best of your knowledge, each contribution will not violate any third party's copyrights, trademarks, patents, or other intellectual property rights; and * each contribution shall be in compliance with U.S. export control laws and other applicable export and import laws. You agree to notify us if you become aware of any circumstance which would make any of the foregoing representations inaccurate in any respect. We may publicly disclose your participation in the project, including the fact that you have signed the SCA. 6. This SCA is governed by the laws of the State of California and applicable U.S. Federal law. Any choice of law rules will not apply. 7. Please place an “x” on one of the applicable statement below. Please do NOT mark both statements: * [X] I am signing on behalf of myself as an individual and no other person or entity, including my employer, has or will have rights with respect to my contributions. * [ ] I am signing on behalf of my employer or a legal entity and I have the actual authority to contractually bind that entity. ## Contributor Details \| Field \| Entry \| \|------------------------------- \| -------------------- \| \| Name \| Dmitry Briukhanov \| \| Company name (if applicable) \| - \| \| Title or role (if applicable) \| - \| \| Date \| 7/24/2018 \| \| GitHub username \| DimaBryuhanov \| \| Website (optional) \| \|	2018-07-24 18:43:27 +02:00
katarkor	5ca853bee0	changed tag_map, morph_rules, lemmatizer for Norwegian (#2565 ) * changed tag_map, morph_rules, lemmatizer for Norwegian * Move unicode declaration up Hopefully fixes test failure on Python 2 * Update CONTRIBUTOR_AGREEMENT.md * Move unicode declarations Hopefully fixes test this time * Revert "Merge remote-tracking branch 'origin/patch-1'" This reverts commit `f5ccd5dd0d`, reversing changes made to `dd07e180ea`. * Update contributor agreement [ci skip]	2018-07-19 19:38:24 +02:00
kororo	2784babef9	Add ExcelCy into Universe list (#2572 ) Hi guys, This is my first spaCy extension. I am excited to able to do this. Please do let me know if there is any suggestions or modifications I need to do. Feel free to use/contribute the repo that I made. ## Description ExcelCy is a SpaCy toolkit to help improve the data training experiences. It provides easy annotation using Excel file format. It has helper to pre-train entity annotation with phrase and regex matcher pipe. ### Types of change Update to Universe list in website. ## Checklist - [x] I have submitted the spaCy Contributor Agreement. - [x] I ran the tests, and all new and existing tests passed. - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-07-19 19:28:33 +02:00
Ioannis Daras	6ed18412d0	Greek language optimizations (#2558 ) * Greek language optimizations * Add encoding on files containing greek words * Add encoding on files containing greek words	2018-07-18 18:51:38 +02:00
Xiang Ji	19a5ef1c58	Fix venv command examples (#2560 ) [ci skip] * Fix venv command examples The documentation refers to `venv`, which is native to Python3. However, the command examples are as if they were still `virtualenv`, which is a package independent of `venv`: - It doesn't need to be installed via `pip`. In fact `pip install venv` would return an error. - The correct way to invoke `venv` is `python3 -m venv`, not `venv`, which would return command not found. See https://docs.python.org/3/library/venv.html I suspect the documentation simply replaced all occurrences of `virtualenv` with `venv`. However they are different modules and are used differently. * Update comment [ci skip]	2018-07-18 10:31:24 +02:00
Tero K	f35980f865	Enhancement/lang fi examples (#2547 ) * Added a file with examples in finnish * added contributor agreement	2018-07-15 09:50:27 +02:00
Eleni170	6042723535	Add support for Greek language (#2535 ) * Add contributor agreement * Support for Greek language * Fix missing el_tokenizer	2018-07-10 13:48:38 +02:00
Bùi Trung Chí	9af46b4f1b	Fix loading tokenizer with custom prefix search (#2495 ) * Add contributor agreement * Fix loading tokenizer with cutom prefix search	2018-07-04 12:56:07 +02:00
Muhammad Irfan	f33c703066	Add Urdu Language Support (#2430 ) * added Urdu language support. * added Urdu language tests. * modified conftest.py for Urdu language support. * added spacy contributor agreement.	2018-06-22 11:14:03 +02:00
himkt	14d9007efd	fix wrong indexing (#2416 ) * fix wrong indexing * add agreement	2018-06-19 10:20:57 +02:00
Aliia E	428bae66b5	Add Tatar Language Support (#2444 ) * add Tatar lang support * add Tatar letters * add Tatar tests * sign contributor agreement * sign contributor agreement [x] * remove comments from Language class * remove all template comments	2018-06-19 10:17:53 +02:00
Cory Hurst	446f5ec41b	Silent keyword in info function in init (#2459 ) * Pass through "silent" kwarg to the wrapper in the spacy module init. reference issue #2196 * Pass through "silent" kwarg to the wrapper in the spacy module init. reference issue #2196 * contributor agreement	2018-06-18 12:24:21 +02:00
Daniel Ruf	d6d688914f	chore: cache dependencies (#2418 ) * chore: cache dependencies * chore: add CLA	2018-06-11 00:22:41 +02:00
himkt	1a568f2e08	fix wrong documentations (#2423 )	2018-06-11 00:21:06 +02:00
Bohdan Moskalevskyi	d66292f767	fix UD data file extensions (#2425 ) * fix UD data files extension * add contributor agreement for msklvsk	2018-06-08 14:26:11 +02:00
Nour Shalabi	a169b79092	Additions to Arabic stop words. (#2422 ) * Additions to Arabic stop words. * Create nourshalabi.md	2018-06-08 02:33:23 +02:00
Maciej	c7d53348d7	Fix bug in CLI iob and ner converter (#2392 ) (fixes #2385 ) * issue_2385 add tests for iob_to_biluo converter function * issue_2385 fix and modify iob_to_biluo function to accept either iob or biluo tags in cli.converter * issue_2385 add test to fix b char bug * add contributor agreement * fill contributor agreement	2018-05-30 12:28:44 +02:00
ansgar-t	9732988951	escape html in displacy.render (#2378 ) (closes #2361 ) ## Description Fix for issue #2361 : replace &, <, >, " with &amp; , &lt; , &gt; , &quot; in before rendering svg ## Checklist <!--- Before you submit the PR, go over this checklist and make sure you can tick off all the boxes. [] -> [x] --> - [x] I have submitted the spaCy Contributor Agreement. - [ ] I ran the tests, and all new and existing tests passed. (As discussed in the comments to #2361) - [x] My changes don't require a change to the documentation, or if they do, I've added all required information.	2018-05-28 18:36:41 +02:00
Samuel Pouyt	d85494bfae	Added agrement (#2374 )	2018-05-26 18:19:08 +02:00
James Messinger	4515e96e90	Better formatting for `spacy train` CLI (#2357 ) * Better formatting for `spacy train` CLI Changed to use fixed-spaces rather than tabs to align table headers and data. ### Before: ``` Itn. P.Loss N.Loss UAS NER P. NER R. NER F. Tag % Token % 0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4 1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1 2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9 ``` ### After: ``` Itn. Dep Loss NER Loss UAS NER P. NER R. NER F. Tag % Token % CPU WPS GPU WPS 0 4618.857 2910.004 76.172 79.645 67.987 88.732 88.261 100.000 4436.9 6376.4 1 4671.972 3764.812 74.481 78.046 62.374 82.680 88.377 100.000 4672.2 6227.1 2 4742.756 3673.473 71.994 77.380 63.966 84.494 90.620 100.000 4298.0 5983.9 ``` * Added contributor file	2018-05-25 13:08:45 +02:00
Aristo Rinjuang	432ede04af	adding more words and rephrasing (#2351 ) * adding more words and rephrasing * adding a contributor * tokenizer bugs solved	2018-05-24 11:40:57 +02:00
Shantam Raj	1a4682dd0b	Update _training.jade (#2340 ) * Update _training.jade Correcting grammar. Replacing "The" with "To". * Create armsp.md * Update armsp.md	2018-05-21 11:09:33 +02:00
Tahar Zanouda	00417794d3	Add Arabic language (#2314 ) * added support for Arabic lang * added Arabic language support * updated conftest	2018-05-15 00:27:19 +02:00
vishnumenon	ae3719ece5	Fix the code for FACILITIY entities (#2324 ) * Fix the code for FACILITIY entities As far as I can tell, the default models all use "FAC" rather than "FACILITY" * Added my Contributor Agreement * Rename vishnumenon to vishnumenon.md	2018-05-12 15:19:17 +02:00
Jani Monoses	42b34832e4	Update Romanian stopword list (#2316 ) * Contributor agreement for janimo * Update Romanian stopword list Include the correct spellings of all the words already in the repo that are using cedillas (ş and ţ) instead of commas (ș and ț). Add another unrelated spelling fix. See https://github.com/stopwords-iso/stopwords-ro/pull/1 and https://github.com/stopwords-iso/stopwords-ro/pull/2	2018-05-10 12:16:56 +02:00
Lucas Abbade	18af53014f	Adding my contributor agreement (#2315 ) * Create LRAbbade.md * Update LRAbbade.md	2018-05-09 21:25:05 +02:00
mauryaland	5368ba028a	Update stop_words.py for French language (#2310 ) * Add contraction forms of some common stopwords All the stopwords added contain the apostrophe" ' "or " ’ ". * Adds contributor agreement mauryaland * Update mauryaland.md	2018-05-09 12:04:38 +02:00
ines	37facf9b4d	Add config for no-response [ci skip]	2018-05-07 22:04:54 +02:00
ines	a685fff875	Merge branch 'master' of https://github.com/explosion/spaCy	2018-05-07 18:58:57 +02:00
ines	e2241c797c	Add lock-threads configuration [ci skip]	2018-05-07 18:54:22 +02:00
B!	414f5270b3	B Cavello's signed Contributor Agreement v2 (#2302 ) This time hopefully created in the right spot. (Sorry about that!)	2018-05-07 17:48:54 +02:00
ines	929a01139a	Order issue templates	2018-05-04 03:04:41 +02:00
Ines Montani	7f39c8896b	Update issue templates (#2295 ) * Update issue templates * Update templates	2018-05-04 03:02:26 +02:00
Douglas Knox	9b49a40f4e	Test and fix for Issue #2219 (#2272 ) Test and fix for Issue #2219: Token.similarity() failed if single letter	2018-05-03 18:40:46 +02:00
G.Pruvost	cc8e804648	#2211 - Support for ssl certs config on download command (#2212 ) * Add support for SSL/Certs customization on download CLI * Add a note on SSL options for the 'download' CLI in the README * Add contributor agreement	2018-05-03 18:37:02 +02:00
Alex Villarreal	13d562e1a4	Fix code sample for Doc.set_extension (#2282 ) * Fix code sample for `set_extension` The previous sample code for `set_extension` fails the assertion at the end, because `city_getter` it checked if the whole document text matches any of the city names. Now it checks if any of the city names is contained in the document text. * Contributor agreement	2018-05-02 10:16:05 +02:00
Mr Roboto	6f5ccda19c	Addresses Issue #2228 - Deserialization fails when using tensor=False or sentiment=False (#2230 ) * Fixes issue #2228 * Adds a new contributor	2018-05-01 13:40:22 +02:00
Shirish Kadam	d98a90440f	Added Adam project to spaCy Universe (#2275 ) * Added 5hirish to contributors * Added Adam Qas Project to spaCy Universe * Remove $ from code example	2018-04-30 22:25:01 +02:00
Matt Upson	87cc6b3599	Add missing comma to NN example in docs (#2255 ) Also add a completed contributor agreement.	2018-04-28 14:56:00 +02:00
Robin Linderborg	d01f503b54	Remove incorrect lemma lookup gäng->gänga (#2252 ) * Remove incorrect lemma lookup gäng->gänga In modern Swedish, "gäng" is mostly associated with "gang" or "group of people". The removed lemma lookup lemmatized it to the verb "thread". * Add contrib agreement to correct directory * Revert change to CONTRIBUTOR_AGREEMENT	2018-04-28 14:54:41 +02:00
Jens Dahl Møllerhøj	e5055e3cf6	Add Danish lemmatizer (#2184 ) * add danish lemmatizer * fill contributor agreement	2018-04-07 19:07:28 +02:00
ines	638068ec6c	Restore contributor agreement	2018-03-31 14:06:37 +02:00
Suraj Rajan	1cdbb7c97c	[2032] - Changed python set to cpp stl set (#2170 ) Changed python set to cpp stl set #2032 ## Description Changed python set to cpp stl set. CPP stl set works better due to the logarithmic run time of its methods. Finding minimum in the cpp set is done in constant time as opposed to the worst case linear runtime of python set. Operations such as find,count,insert,delete are also done in either constant and logarithmic time thus making cpp set a better option to manage vectors. Reference : http://www.cplusplus.com/reference/set/set/ ### Types of change Enhancement for `Vectors` for faster initialising of word vectors(fasttext)	2018-03-31 13:28:25 +02:00
Katrin Leinweber	6f84e32253	Formalise citation info (#2167 ) * Create CITATION file * Add Katrinleinweber contributor agreement	2018-03-30 10:34:14 +02:00
Viet Trung Tran	ea2af94cd9	Add support for Vietnamese in spaCy by leveraging Pyvi, an external Vietnamese tokenizer (#2155 ) * support for Vietnamese * Contributor Agreement for adding Vietnamese support on spaCy	2018-03-29 12:19:51 +02:00
ines	6173c4aaa6	Port over contributor agreements	2018-03-24 17:17:37 +01:00
Aaron Marquez	c7926f72eb	add contributor agreement for @enerrio	2018-02-15 12:43:04 -08:00
Claudiu-Vlad Ursache	cdd4b3d05c	Add contributor agreement for @ursachec	2018-02-13 20:49:42 +01:00
Johannes Dollinger	012e874d09	Add contributor agreement for emulbreh	2018-02-13 13:40:33 +01:00
Lyndon White	94ce43adf0	squashme	2018-02-09 23:19:11 +08:00
Lyndon White	5b1bc8d101	Sign contributors agreement	2018-02-09 23:14:29 +08:00
Pradeep Kumar Tippa	f1911ef73a	Added pktippa contributor agreement	2018-02-07 15:37:28 +05:30
sayf eddine hammemi	35272eade8	Accept contributer agreement.	2018-02-04 20:48:45 +01:00
Adam Binford	1a2c2f7d7f	Fixed auto linking after download and added simple test to check	2018-01-29 14:25:21 -05:00
Matthew Honnibal	cb7110c22e	Merge pull request #1882 from ohenrik/nb_lemma_and_tag_map Add norwegian bokmål ('nb') lemmatizer and tag_map	2018-01-29 18:18:50 +01:00
Thomas Opsomer	f35895d81b	add contributor agreement	2018-01-28 20:12:05 +01:00
Ole Henrik Skogstrøm	bbc758526c	Added contributors agreement	2018-01-25 11:05:29 +01:00
Ali Zarezade	c27c7bf0e0	add contributors.md	2018-01-23 13:47:30 +03:30
Avadh Patel	5029d65738	Signed contributor agreement Signed-off-by: Avadh Patel <avadh4all@gmail.com>	2018-01-17 06:33:37 -06:00
Ines Montani	36f426fe0a	Merge pull request #1808 from fucking-signup/master Fix issue #1769	2018-01-12 21:12:02 +00:00
Ines Montani	b52f5fb05d	Merge pull request #1830 from Babylonpartners/external-release Signed the contributor agreement	2018-01-11 19:00:30 +00:00
Sasho Savkov	84d65873d2	Renamed the file	2018-01-11 17:49:29 +00:00
Sasho Savkov	a1d2d1f263	Signed the contributor agreement Looking forward to contributing some code :)	2018-01-11 17:46:31 +00:00
pbnsilva	78383f38a6	Adds contributor agreement	2018-01-11 17:40:12 +01:00
Kit	dba6adea65	Add contributor agreement	2018-01-08 03:08:57 +01:00
Kevin Humphreys	6173b697a7	add agreement	2018-01-03 13:00:14 -08:00
zqhZY	29898946cd	add contributors.md	2017-12-28 18:04:52 +08:00
Ines Montani	97f100f69f	Merge pull request #1742 from kimfalk/master Two corrections in the da lan.	2017-12-20 21:02:00 +00:00
Ines Montani	d682a8803e	Merge pull request #1672 from cbilgili/master Adds Turkish Lemmatization	2017-12-20 21:01:00 +00:00
ines	5e5d47fe50	Add contributor agreement (see #1672 )	2017-12-20 22:00:12 +01:00
Kim FalkJørgensen	fc7cf85af5	agreeing to the contributor agreement.	2017-12-19 15:31:52 +01:00
Martin Andrews	67de1ad11e	Create mdda.md	2017-12-18 18:09:27 +08:00
Ines Montani	1a400ac874	Rename d99kris to d99kris.md	2017-12-17 13:44:55 +01:00
Kristofer Berggren	cacdf4ad19	Add d99kris to contributors Add myself (d99kris) to spaCy Contributor Agreement, for PR https://github.com/explosion/spaCy/pull/1731	2017-12-17 20:43:23 +08:00
Bri-Will	afd9fc9d36	Adds contributor agreement for Bri-Will	2017-12-11 14:38:37 -08:00
Isaac Sijaranamual	f32c6630cb	Adds contributor agreement IsaacHaze	2017-12-10 23:15:06 +01:00
Ines Montani	51d3ab2137	Revert contributor agreement to empty form	2017-12-07 16:22:30 +01:00
Canbey Bilgili	86ac8ea5ba	Adds Canbey Bilgili's Contributor Agreement	2017-12-01 17:27:41 +03:00
Matthew Honnibal	6bc0f4d29f	Merge pull request #1611 from fsonntag/master Solving #1494	2017-11-29 23:11:23 +01:00
Matthew Honnibal	f9ed9ea529	Merge pull request #1624 from GreenRiverRUS/russian Add support for Russian	2017-11-29 23:10:01 +01:00
Hugo	88d829f60c	CLA	2017-11-29 10:25:20 +02:00
Vadim Mazaev	49b4e2c158	Added contributor agreement	2017-11-26 22:14:08 +03:00
Søren Lind Kristiansen	b91986b726	Add contributor agreement.	2017-11-24 15:29:54 +01:00
markulrich	c9b63c0dfc	Use correct local parameter in example MyComponent (and added markulrich.md contributor file)	2017-11-22 15:59:08 -08:00
Burton DeWilde	833c66c9b2	Add contributor agreement	2017-11-20 11:28:31 -06:00
cclauss	31085dcbb6	Create cclauss.md	2017-11-20 14:57:30 +01:00
Felix Sonntag	ada4712250	Add contributer aggreement	2017-11-19 16:30:35 +01:00
Motoki Wu	7b5b49eef0	added contributor agreement	2017-11-17 17:27:20 -08:00
Martino Mensio	239a0f391d	added contributor agreement	2017-11-17 16:30:09 +01:00
Ines Montani	339675c9fb	Merge pull request #1565 from DuyguA/patch-2 added contributor agreement for DuyguA	2017-11-13 16:21:50 +01:00
Duygu Altinok	c263c3acce	added contributor agreement for DuyguA	2017-11-13 15:45:13 +01:00
Abhinav Sharma	4dd34058a2	Create abhi18av.md	2017-11-13 17:23:05 +05:30
Roman Domrachev	378280039b	Fill contributer agreement	2017-11-11 11:39:31 +03:00
Mathias Deschamps	b639b4c6b4	Add spaCy Contributor Agreement	2017-11-09 11:56:47 +01:00
Daniel Hershcovich	6eb4a41316	Signed contributor agreement	2017-11-08 16:28:56 +02:00
Abhinav Sharma	d097b34059	Update CONTRIBUTOR_AGREEMENT.md	2017-11-08 14:16:04 +05:30
Abhinav Sharma	c9c4aaec44	corrected a typo	2017-11-08 13:33:15 +05:30
uwol	9c9ed7890a	added contributor agreement	2017-11-05 12:33:43 +01:00
Abhinav Sharma	2aaf5315f3	Filled the details of the contribution license	2017-11-03 16:56:58 +05:30
Jim O'Regan	41dd29e48e	merge	2017-10-31 14:07:45 +00:00
ines	1730648e19	Update pull request template	2017-10-24 21:49:11 +02:00
ines	7459ecfa87	Port over contributor agreements	2017-10-24 20:13:34 +02:00
Ramanan Balakrishnan	7b9b1be44c	Support single value for attribute list in doc.to_array	2017-10-19 17:00:41 +05:30
Jim O'Regan	3c4d83aa6e	CLA	2017-06-26 21:32:48 +01:00
Ines Montani	bc88f9865e	Remove file (already covered in PR)	2017-04-27 11:17:30 +02:00
Leif Uwe Vogelsang	13ce4c96b1	Update luvogels.md	2017-04-27 10:42:07 +02:00
Leif Uwe Vogelsang	e136c51393	Update Alpha_support_Norwegian bokmål.md	2017-04-26 23:24:11 +02:00
luvogels	cbfe4920bb	Added contributor agreement and pull request doc	2017-04-26 18:02:34 +02:00
oeg	b10bc1a177	Adds contributor agreement dvsrepo	2017-04-07 11:58:28 +02:00
Ines Montani	041f402352	Update ISSUE_TEMPLATE.md	2017-03-18 22:01:54 +01:00
shuvanon	8a2d22222d	filled up CONTRIBUTOR_AGREEMENT.md	2017-03-12 17:07:55 +06:00
Michael Wallin	55b1e5e682	[finnish] Add contributor file	2017-02-04 13:54:10 +02:00
Ines Montani	b50c499c04	Fix consistency	2017-01-16 20:44:31 +01:00
Ines Montani	8a615e8961	Simplify and update pull request template	2017-01-16 20:43:52 +01:00
Ines Montani	05b3668916	Remove bold formatting as it occasionally causes markup errors	2017-01-09 20:26:09 +01:00
Ines Montani	aa876884f0	Revert "Revert "Merge remote-tracking branch 'origin/master'"" This reverts commit `fb9d3bb022`.	2017-01-09 13:28:13 +01:00
Gyorgy Orosz	ade7487ff8	Accepted contributor agreement.	2016-12-26 22:37:02 +01:00
Magnus Burton	db5a077d2b	Initial commit for Swedish	2016-12-20 11:05:06 +01:00
dafnevk	af761fd664	Signed Contributer Agreement by Rob van Nieuwpoort	2016-12-15 10:34:19 +01:00
Ines Montani	1515434eaa	Fix wording	2016-11-09 17:29:20 +01:00
Ines Montani	8cd361e319	Fix paths	2016-11-09 17:20:35 +01:00
Ines Montani	592d244484	Re-add existing contributor agreements	2016-11-09 16:42:02 +01:00
Ines Montani	8239878dce	Update CONTRIBUTOR_AGREEMENT.md	2016-11-09 16:30:32 +01:00
Ines Montani	3005f76652	Create CONTRIBUTOR_AGREEMENT.md	2016-11-09 16:23:55 +01:00
Ines Montani	cd89f6b602	Add line break	2016-11-03 00:03:37 +01:00
Ines Montani	eea9f1aab4	Make small changes and update StackOverflow text	2016-11-03 00:02:49 +01:00
Sam Bozek	3bc4c6bbab	Minor grammar tweak to environment section	2016-10-28 23:32:11 -07:00
Sam Bozek	39a2e993f1	Add link to spaCy stackoverflow tag	2016-10-28 23:27:36 -07:00
Sam Bozek	f0eba7f568	Tightening up ISSUE_Template	2016-10-28 23:21:50 -07:00
Sam Bozek	c2d2b12ef7	Tightening up ISSUE_Template	2016-10-28 23:21:32 -07:00
Sam Bozek	1f8826e906	Tracking proper file: ISSUE_TEMPLATE.md	2016-10-27 23:19:18 -07:00
Sam Bozek	071989fd15	Renamed ISSUES.md to proper template convention of ISSUE_TEMPLATE.md	2016-10-27 23:18:38 -07:00
Sam Bozek	b66c4c3788	Formatted the steps to be in neat ordered list. Minor grammar adjustments.	2016-10-27 23:09:28 -07:00
Sam Bozek	932f4c8469	Finished ISSUES.md * Added section for reproducing bug * Promote step by step reproduction of code, see if anything has been incorrectly done. * Context section for extra details that might be helpful for issue tracking * Final Checklist. Verify bug can be reproduced and details are present.	2016-10-27 23:03:33 -07:00
Sam Bozek	bd48d44fa7	Started work on ISSUES template. Want to have small section to address each issue area: * What happened/what did you expect? * Fix suggestions/reason bug happened. * How to reproduce for other contributors to replicate. * Look at how workflow was affected by issue. * As much detail as they can provide about their setup.	2016-10-27 13:46:35 -07:00
Sam Bozek	3e1dbbd52a	Clarified a few statements on the pull request template. * Added spaCy to several areas to avoid confusion * Simplified a few statements and checkboxes.	2016-10-26 18:27:56 -07:00
Sam Bozek	63b7c2ef61	Added pull request template. Making following assumptions: * Pull requests do not need to originate from Issues discussion. * But encouraged * No current CONTRIBUTING.md file * honor code that people follow current coding conventions * Tests run and passed * New features require additonal tests for confirmation	2016-10-26 18:20:08 -07:00

... 2 3 4 5 6 ...

337 Commits