Bruno
1a77607036
spaCy v3 is not saving the best version in training loop ( #6629 )
...
* Save best only if is the best and also respect the average config
* Create bratao.md
* Update loop.py
* Remove average check
* Keep before_to_disk
2021-01-06 12:51:30 +11:00
Yosi
cf52510631
Add Amharic አማርኛ Language support ( #6583 )
...
* Add Amharic to space
* clean up
* Add some PRON_LEMMA
* add Tigrinya support
* remove text_noun_chunks
* Tigrinya Support
* added some more details for ti
* fix unit test
* add amharic char range
* changes from review
* amharic and tigrinya share same unicode block
* get rid of _amharic/_tigrinya in char_classes
Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip]
2020-12-16 16:40:50 +11:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird ( #6576 )
2020-12-15 20:59:47 +01:00
Raf Guns
a90ca0e1fb
Add contributor agreement
2020-12-14 22:01:14 +01:00
Adriane Boyd
724831b066
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
...
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
992723dfac
Add jabortell to the contributors ( #6422 )
...
* Add jabortell to the contributors
* Update jabortell.md
Added tick to applicable statement
2020-11-24 16:15:31 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser ( #6428 )
...
* Avoid a SyntaxError in self-attentive-parser
Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser
* Create forest1988.md
Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) ( #6399 )
...
Request to include PatternOmatic in spaCy Universe
Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language ( #6248 )
...
* Added Multext-East V5 tagset for Croatian language
* Create danielvasic.md
* Update danielvasic.md
* Update danielvasic.md
* Add tag map to CroatianDefaults
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking ( #6302 )
...
* add oprd to the list of accepted deps for noun chunking
* add SCA
2020-11-05 09:17:35 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek ( #6285 )
...
* Fill contributor agreement by robertsipek
* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON ( #6275 )
...
* Adding Mindmeld to Universe JSON
Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/
* Signing contribution agreement.
Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
walterhenry
ff82644746
User contributor agreement
...
Here it is!
2020-10-19 16:25:09 +02:00
Jan Margeta
ed1c37189a
Add contributor agreement for jmargeta
2020-10-16 00:38:42 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language ( #6230 )
...
* Include Macedonian language
* Fix indentation at char_classes.py
* Fix indentation at char_classes.py
* Add Macedonian tests, update lex_attrs and char_classes
* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
178760855f
Merge branch 'develop' into master-tmp
2020-10-15 09:06:03 +02:00
Florijan Stamenković
18f5c309dc
Fix Issue 6207 ( #6208 )
...
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-09 10:14:40 +02:00
Šarūnas Navickas
287ba94a2f
Website (Universe): An entry for rita-dsl ( #6138 )
...
* Create zaibacu.md
* Add RITA-DSL entry
* Update agreement
* Fix formatting
2020-10-09 10:14:40 +02:00
delzac
668507be1b
Reflect on usage doc that IS_SENT_START attribute exist ( #6114 )
...
* Reflect on usage doc that IS_SENT_START attribute exist
* Create delzac.md
2020-10-09 10:14:40 +02:00
Rahul Gupta
1a00bff06d
Hindi: Adds tests for lexical attributes (norm and like_num) ( #5829 )
...
* Hindi: Adds tests for lexical attributes (norm and like_num)
* Signs and sdds the contributor agreement
* Add ordinal numbers to be tagged as like_num
* Adds alternate pronunciation for 31 and 39
2020-10-07 10:23:32 +02:00
Nuccy90
c809b2c8e7
Update morph_rules.py ( #6102 )
...
* Update morph_rules.py
Added "dig" and "dej" ("you" in accusative form)
* Create Nuccy90.md
* Update Nuccy90.md
2020-10-06 15:14:47 +02:00
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist ( #6114 )
...
* Reflect on usage doc that IS_SENT_START attribute exist
* Create delzac.md
2020-10-06 15:11:01 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl ( #6138 )
...
* Create zaibacu.md
* Add RITA-DSL entry
* Update agreement
* Fix formatting
2020-10-06 11:19:36 +02:00
Florijan Stamenković
9db670b996
Fix Issue 6207 ( #6208 )
...
* Regression test for issue 6207
* Fix issue 6207
* Sign contributor agreement
* Minor adjustments to test
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-06 11:17:37 +02:00
Ines Montani
59deeb7da6
Merge branch 'develop' into master-tmp
2020-10-04 14:52:20 +02:00
Stanislav Schmidt
3589a64d44
Change type of texts argument in pipe to iterable ( #6186 )
...
* Change type of texts argument in pipe to iterable
* Add contributor agreement
2020-10-02 21:00:11 +02:00
Muhammad Fahmi Rasyid
7489d02dea
Update Indonesian Example Phrases ( #6124 )
...
* create contributor agreement
* Update Indonesian example. (see #1107 )
Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.
2020-09-23 14:02:26 +02:00
Ines Montani
864a697e63
Merge branch 'develop' into master-tmp
2020-09-04 13:15:36 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example ( #5989 )
...
* Update suffixes example
The current example will throw `TypeError: can only concatenate list (not "tuple") to list`
* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Shashank
450720aca2
Added support for Sanskrit language ( #5956 )
...
* Added support for Sanskrit language
* Added tests for lexical attribute like_num
2020-08-25 10:56:29 +02:00
idoshr
b10c7bc56e
Hebrew like num ( #5952 )
...
* Update stop_words.py
Hebrew STOP WORDS
* Update stop_words.py
* contributor
* contributor
* add some common domain extentions
support human number 1K/1M....
* support human number 1K/1M....
* hebrew number tokenize
1K/1M implement in EN
* test human tokenize fix
* test
* heb like num
revert human number change
* heb like num
2020-08-24 14:30:05 +02:00
Attila Szász
669dc70822
Create tilusnet.md ( #5914 )
2020-08-12 22:46:08 +02:00
Adam Bittlingmayer
7b33b2854f
Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct ( #5910 )
...
* Add Armenian sentence-final verchaket
* Add Greek and Arabic question marks, and contributor agreement
* Check box
2020-08-12 15:36:14 +02:00
graue70
49e690bde1
Fix typos in comments ( #5904 )
...
* Fix typo in comment
* Fix typo
* Add spaCy Contributor Agreement
2020-08-12 15:35:25 +02:00
holubvl3
d16c0f2c3a
Create holubvl3 ( #5845 )
...
* Create holubvl3
* Rename holubvl3 to holubvl3.md
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-07-30 17:40:31 +02:00
Gustavo Zadrozny Leyendecker
90b958fd01
Fix on EntityRendered to support break lines (after last entity) ( closes #5838 )
2020-07-29 18:48:39 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file ( #5810 )
...
* fix the wrong hash url in adding-languages.md file
change the #101 url hash path to #language-data
* filled in the spaCy Contributor Agreement
filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Joshua Olson
6d4d5c074c
Mark Japanese documents as tagged. ( #5803 )
...
Mark the document as tagged before returning it to the user from the JapaneseTokenizer.
Fixes #5802
2020-07-23 08:57:01 +02:00
Ines Montani
644074b954
Merge branch 'develop' into master-tmp
2020-07-20 14:58:04 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe ( #5777 )
...
* Update universe.json
Add cov-bsv to "resources"
* Update universe.json
* add contributor agreement
2020-07-19 13:35:31 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json ( #5717 )
...
* Adding spaczz package to universe.json
* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json ( #5716 )
...
* Add texthero to universe.json
* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Mike Izbicki
7a2ca00794
fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab ( #5701 )
...
* speed up Korean nlp 100x by stopping mecab from reloading on each doc
* add contributor agreement
* rename variables to improve code readability
2020-07-06 17:03:33 +02:00
Sebastián Ramírez
b985cc4025
📄 Add spaCy Contributor Agreement
2020-07-01 20:57:21 +02:00
Ines Montani
414dc7ace1
Merge branch 'spacy.io' into spacy.io-develop
2020-07-01 11:47:47 +02:00
Matthias Hertel
305221f3e5
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:55 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example ( #5669 )
...
* fixed token span in pattern matcher example
* contributor agreement
2020-06-30 19:58:23 +02:00
PluieElectrique
90c7eb0e2f
Reduce memory usage of Lookup's BloomFilter ( #5606 )
...
* Reduce memory usage of Lookup's BloomFilter
* Remove extra Table update
2020-06-26 14:09:10 +02:00
Richard Liaw
0ef78bad93
contribute ( #5632 )
2020-06-23 08:53:58 +02:00
Rameshh
c34420794a
Add Nepali Language ( #5622 )
...
* added support for nepali lang
* added examples and test files
* added spacy contributor agreement
2020-06-22 10:25:46 +02:00
Karen Hambardzumyan
ff6a084e9c
Create mahnerak.md ( #5615 )
2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan
ccd7edf04b
Create myavrum.md ( #5612 )
2020-06-19 18:34:27 +02:00
Arvind Srinivasan
aa5b40fa64
Added Tamil Example Sentences ( #5583 )
...
* Added Examples for Tamil Sentences
#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107
#### Type of Change
This is an enhancement.
* Accepting spaCy Contributor Agreement
* Signed on my behalf as an individual
2020-06-13 15:56:26 +02:00
theudas
fa46e0bef2
Added Parameter to NEL to take n sentences into account ( #5548 )
...
* added setting for neighbour sentence in NEL
* added spaCy contributor agreement
* added multi sentence also for training
* made the try-except block smaller
2020-06-12 02:03:23 +02:00
Jones Martins
28db7dd5d9
Add missing pronoums/determiners ( #5569 )
...
* Add missing pronoums/determiners
* Add test for missing pronoums
* Add contributor file
2020-06-10 18:47:04 +02:00
Martino Mensio
de00f967ce
adding spacy-universal-sentence-encoder ( #5534 )
...
* adding spacy-universal-sentence-encoder
* update affiliation
* updated code example
2020-06-08 20:26:30 +02:00
Hiroshi Matsuda
456bf47f51
fix a bug causing mis-alignments ( #5560 )
2020-06-08 15:49:34 +02:00
Leo
7d5a89661e
contributor agreement signed ( #5525 )
2020-05-31 20:13:39 +02:00
Rajat
8b8efa1b42
update spacy universe with my project ( #5497 )
...
* added contextualSpellCheck in spacy universe meta
* removed extra formatting by code
* updated with permanent links
* run json linter used by spacy
* filled SCA
* updated the description
2020-05-25 11:30:23 +02:00
Jannis
aa53ce6996
Documentation Typo Fix ( #5492 )
...
* Fix typo
Change 'realize' to 'realise'
* Add contributer agreement
2020-05-22 19:50:26 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
...
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Kevin Lu
9a1a535215
Create kevinlu1248.md
2020-05-19 20:25:45 -07:00
Ines Montani
a41e28ceba
Merge pull request #5436 from ilivans/fix_errors_with_codes
2020-05-18 10:45:56 +02:00
Ilkyu Ju
72a25c9cef
Very minor issues in Korean example sentences ( #5446 )
...
* Add contributor agreement
* Improve ko translation of example sentences
I fixed unnatural translations and word spacing errors.
* Update osori.md
2020-05-17 13:43:34 +02:00
Ilia Ivanov
ee8fe37474
Add ilivans' contributor agreement
2020-05-14 15:59:06 +02:00
Vishnu Priya VR
9ce059dd06
Limiting noun_chunks for specific languages ( #5396 )
...
* Limiting noun_chunks for specific langauges
* Limiting noun_chunks for specific languages
Contributor Agreement
* Addressing review comments
* Removed unused fixtures and imports
* Add fa_tokenizer in test suite
* Use fa_tokenizer in test
* Undo extraneous reformatting
Co-authored-by: adrianeboyd <adrianeboyd@gmail.com>
2020-05-14 12:58:06 +02:00
Travis Hoppe
d4cc18b746
Added author information for NLPre ( #5414 )
...
* Add author links for NLPre and update category
* Add contributor statement
2020-05-08 11:28:54 +02:00
Samuel Rodríguez Medina
8602daba85
Swedish like_num ( #5371 )
...
* Sign contributor agreement.
* Add like_num functionality to Swedish.
* Update spacy/tests/lang/sv/test_lex_attrs.py
Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
* Update contributor agreement
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-29 21:25:22 +02:00
adrianeboyd
a6e521cd79
Add is_sent_end token property ( #5375 )
...
Reconstruction of the original PR #4697 by @MiniLau.
Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Louis Guitton
a27c4014f5
Add mlflow to spaCy universe ( #5352 )
...
* Add mlflow to universe
* Use mlflow black logo
2020-04-29 10:18:03 +02:00
Michael
5b5528ff2e
Add !=3.4.*
to python_requires ( #5344 )
...
Missed in 80d554f2e2
2020-04-27 22:02:09 +02:00
Punitvara
b2b7e1f37a
This PR adds Gujarati Language class along with ( #5355 )
...
* This PR adds Gujarati Language class along with
- stop words
* Add test for gu tokenizer
2020-04-27 11:07:37 +02:00
sabiqueqb
fc91660aa2
Gh 5339 language class for malayalam ( #5342 )
...
* Initialize Malayalam Language class
* Add lex_attrs and examples for Malayalam
* Add spaCy Contributor Agreement
* Add test for ml tokenizer
2020-04-27 09:45:08 +02:00
Mike
481574cbc8
[minor doc change] embedding vis. link is broken in website/docs/usage/examples.md
( #5325 )
...
* The embedding vis. link is broken
The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?
* contributor agreement
* Update Mlawrence95.md
* Update website/docs/usage/examples.md
Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-21 20:35:12 +02:00
laszabine
fb73d4943a
Amend documentation to Language.evaluate ( #5319 )
...
* Specified usage of arguments to Language.evaluate
* Created contributor agreement
2020-04-16 20:00:18 +02:00
Jakob Jul Elben
663333c3b2
Fixes #5413 ( #5315 )
...
* Fix 5314
* Add contributor
* Resolve requested changes
Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>
2020-04-16 13:29:02 +02:00
Sébastien Harinck
dac70f29eb
contrib: add contributor agreement for user sebastienharinck ( #5316 )
2020-04-16 11:32:09 +02:00
Paolo Arduin
1ca32d8f9c
Matcher support for Span as well as Doc ( #5113 )
...
* Matcher support for Span, as well as Doc #5056
* Removes an import unused
* Signed contributors agreement
* Code optimization and better test
* Add error message for bad Matcher call argument
* Fix merging
2020-04-15 13:51:33 +02:00
Thomas Thiebaud
1eef60c658
Add spacy_fastlang to universe ( #5271 )
...
* Add spacy_fastlang to universe
* Sign SCA
2020-04-15 13:50:46 +02:00
Paolo Arduin
8ce408d2e1
Comparison predicate handling for !=
( #5282 )
...
* Fix #5281
* Optim test
2020-04-14 19:14:15 +02:00
Marek Grzenkowicz
6a8a52650f
[ Closes #5292 ] Fix typo in option name "--n-save_every" ( #5293 )
...
* Sign contributor agreement for chopeen
* Fix typo in option name and close #5292
2020-04-11 23:35:01 +02:00
Umar Butler
8952effcc4
Fixed Typo in Warning ( #5284 )
...
* Fixed typo in cli warning
Fixed a typo in the warning for the provision of exactly two labels, which have not been designated as binary, to textcat.
* Create and signed contributor form
2020-04-09 15:46:15 +02:00
Leander Fiedler
b63871ceff
issue5230: added contributors agreement
2020-04-06 21:04:06 +02:00
vincent d warmerdam
f329d5663a
add "whatlies" to spaCy universe ( #5252 )
...
* Add "whatlies"
We're releasing it on our side officially on the 16th of April. If possible, let's announce around the same time :)
* sign contributor thing
* Added fancy gif
as the image
* Update universe.json
Spellin error and spaCy clarification.
2020-04-06 11:29:30 +02:00
YohannesDatasci
beef184e53
Armenian language support ( #5246 )
...
* add Armenian language and test cases
* agreement submission
2020-04-03 13:02:18 +02:00
Michael Leichtfried
2b14997b68
Remove duplicated branch in if/else-if statement ( #5234 )
...
* Remove duplicated branch in if-elif-statement
* Add contributor agreement for leicmi
2020-04-02 14:47:42 +02:00
Jacob Lauritzen
0b76212831
Extend and fix Danish examples ( #5227 )
...
* Extend and fix Danish examples
This PR fixes two examples, adds additional examples translated from the english version, and adds punctuation.
The two changed examples are:
* "fortov" changed to "fortovet", which is more [used](https://www.google.com/search?client=firefox-b-d&sxsrf=ALeKk0143gEuPe4IbIUpzBBt-oU10OMVqA%3A1585549036477&ei=7I6BXuvJHMGOrwSqi46oCQ&q=l%C3%B8behjul+p%C3%A5+fortov&oq=l%C3%B8behjul+p%C3%A5+fortov&gs_lcp=CgZwc3ktYWIQAzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQR1DT8xZY0_MWYK_0FmgAcAZ4AIABAIgBAJIBAJgBAKABAaoBB2d3cy13aXo&sclient=psy-ab&ved=0ahUKEwjr7964xsHoAhVBx4sKHaqFA5UQ4dUDCAo&uact=5 ) and more natural. The Swedish and Norwegian examples also use this version of the word.
* "stor by" changed to "storby". In Danish we have a specific noun to describe a large, metropolitan city which is different from just describing a city as "large". In this sentence it would be much more natural to describe London as a "storby". Google even correct as search for "London stor by" to "London storby".
* Sign contrib agreement
2020-04-02 10:42:35 +02:00
Nikhil Saldanha
4f27a24f5b
Add kannada examples ( #5162 )
...
* Add example sentences for Kannada
* sign contributor agreement
2020-03-29 13:54:42 +02:00
Tom Milligan
e904958115
Limit to cupy-cuda v8, so as not to pull in v9 automatically. ( #5194 )
2020-03-29 13:52:08 +02:00
Tiljander
e53232533b
Describing priority rules for overlapping matches ( #5197 )
...
* Describing priority rules for overlapping matches
* Create Tiljander.md
* Describing priority rules for overlapping matches
* Update website/docs/api/entityruler.md
Co-Authored-By: Ines Montani <ines@ines.io>
Co-authored-by: Ines Montani <ines@ines.io>
2020-03-26 13:13:22 +01:00
Ines Montani
3fc2309c48
Merge pull request #5174 from Baciccin/master
...
Add Ligurian language
2020-03-24 16:33:59 +01:00
Philip Gillißen
128acb9ee1
Update guerda.md
2020-03-24 10:42:30 +01:00
Philip Gillißen
5d067bcc5e
Add SCA for guerda
2020-03-24 10:42:10 +01:00
Baciccin
3b53617a69
Add Ligurian language
2020-03-19 21:37:01 -07:00
Ines Montani
17bd9ed84f
Merge pull request #5153 from pinealan/fix/website-docs
...
Fix website typos and weird sentences
2020-03-16 15:03:01 +01:00
Alan Chan
1ae01684cf
Fill in contributor agreement
2020-03-15 03:45:20 +08:00
nihil
9cde7eb08c
add spacy_syllables to universe + sign contributor agreement
2020-03-13 18:09:42 +01:00
Himanshu Garg
27d1300bdb
Create merrcury.md
2020-03-10 15:11:07 +05:30
Mark Abraham
0345135167
Tokenizer to_disk and from_disk now ensure paths ( #5116 )
...
* Tokenizer to_disk and from_disk now ensure strings are converted to paths
Fixes #5115
* Sign contributor agreement
2020-03-08 13:25:56 +01:00