Commit Graph

356 Commits

Author SHA1 Message Date
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords (#6621)
* add contributor agreement

* update ro stopwords list

* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish (#6246)
* add syntax iterators for danish

* add test noun chunks for danish syntax iterators

* add contributor agreement

* update da syntax iterators to remove nested chunks

* add tests for da noun chunks

* Fix test

* add missing import
* fix example

* Prevent overlapping noun chunks

Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Yosi
cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Jacob Bortell
992723dfac
Add jabortell to the contributors (#6422)
* Add jabortell to the contributors

* Update jabortell.md

Added tick to applicable statement
2020-11-24 16:15:31 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language (#6248)
* Added Multext-East V5 tagset for Croatian language

* Create danielvasic.md

* Update danielvasic.md

* Update danielvasic.md

* Add tag map to CroatianDefaults

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking (#6302)
* add oprd to the list of accepted deps for noun chunking

* add SCA
2020-11-05 09:17:35 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Rahul Gupta
1a00bff06d
Hindi: Adds tests for lexical attributes (norm and like_num) (#5829)
* Hindi: Adds tests for lexical attributes (norm and like_num)

* Signs and sdds the contributor agreement

* Add ordinal numbers to be tagged as like_num

* Adds alternate pronunciation for 31 and 39
2020-10-07 10:23:32 +02:00
Nuccy90
c809b2c8e7
Update morph_rules.py (#6102)
* Update morph_rules.py

Added "dig" and "dej" ("you" in accusative form)

* Create Nuccy90.md

* Update Nuccy90.md
2020-10-06 15:14:47 +02:00
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-06 11:19:36 +02:00
Florijan Stamenković
9db670b996
Fix Issue 6207 (#6208)
* Regression test for issue 6207

* Fix issue 6207

* Sign contributor agreement

* Minor adjustments to test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-06 11:17:37 +02:00
Stanislav Schmidt
3589a64d44
Change type of texts argument in pipe to iterable (#6186)
* Change type of texts argument in pipe to iterable

* Add contributor agreement
2020-10-02 21:00:11 +02:00
Muhammad Fahmi Rasyid
7489d02dea
Update Indonesian Example Phrases (#6124)
* create contributor agreement

* Update Indonesian example. (see  #1107)

Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.
2020-09-23 14:02:26 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example (#5989)
* Update suffixes example

The current example will throw `TypeError: can only concatenate list (not "tuple") to list`

* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Shashank
450720aca2
Added support for Sanskrit language (#5956)
* Added support for Sanskrit language

* Added tests for lexical attribute like_num
2020-08-25 10:56:29 +02:00
idoshr
b10c7bc56e
Hebrew like num (#5952)
* Update stop_words.py

Hebrew STOP WORDS

* Update stop_words.py

* contributor

* contributor

* add some common domain extentions
support human number 1K/1M....

* support human number 1K/1M....

* hebrew number tokenize
1K/1M implement in EN

* test human tokenize fix

* test

* heb like num
revert human number change

* heb like num
2020-08-24 14:30:05 +02:00
Attila Szász
669dc70822
Create tilusnet.md (#5914) 2020-08-12 22:46:08 +02:00
Adam Bittlingmayer
7b33b2854f
Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct (#5910)
* Add Armenian sentence-final verchaket

* Add Greek and Arabic question marks, and contributor agreement

* Check box
2020-08-12 15:36:14 +02:00
graue70
49e690bde1
Fix typos in comments (#5904)
* Fix typo in comment

* Fix typo

* Add spaCy Contributor Agreement
2020-08-12 15:35:25 +02:00
holubvl3
d16c0f2c3a
Create holubvl3 (#5845)
* Create holubvl3

* Rename holubvl3 to holubvl3.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-07-30 17:40:31 +02:00
Gustavo Zadrozny Leyendecker
90b958fd01
Fix on EntityRendered to support break lines (after last entity) (closes #5838) 2020-07-29 18:48:39 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file (#5810)
* fix the wrong hash url in adding-languages.md file

change the #101 url hash path to #language-data

* filled in the spaCy Contributor Agreement 

filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Joshua Olson
6d4d5c074c
Mark Japanese documents as tagged. (#5803)
Mark the document as tagged before returning it to the user from the JapaneseTokenizer.
Fixes #5802
2020-07-23 08:57:01 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe (#5777)
* Update universe.json

Add cov-bsv to "resources"

* Update universe.json

* add contributor agreement
2020-07-19 13:35:31 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json (#5717)
* Adding spaczz package to universe.json

* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json (#5716)
* Add texthero to universe.json

* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Mike Izbicki
7a2ca00794
fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab (#5701)
* speed up Korean nlp 100x by stopping mecab from reloading on each doc

* add contributor agreement

* rename variables to improve code readability
2020-07-06 17:03:33 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:23 +02:00
PluieElectrique
90c7eb0e2f
Reduce memory usage of Lookup's BloomFilter (#5606)
* Reduce memory usage of Lookup's BloomFilter

* Remove extra Table update
2020-06-26 14:09:10 +02:00
Richard Liaw
0ef78bad93
contribute (#5632) 2020-06-23 08:53:58 +02:00
Rameshh
c34420794a
Add Nepali Language (#5622)
* added support for nepali lang

* added examples and test files

* added spacy contributor agreement
2020-06-22 10:25:46 +02:00
Karen Hambardzumyan
ff6a084e9c
Create mahnerak.md (#5615) 2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan
ccd7edf04b
Create myavrum.md (#5612) 2020-06-19 18:34:27 +02:00
Arvind Srinivasan
aa5b40fa64
Added Tamil Example Sentences (#5583)
* Added Examples for Tamil Sentences

#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107 

#### Type of Change
This is an enhancement.

* Accepting spaCy Contributor Agreement

* Signed on my behalf as an individual
2020-06-13 15:56:26 +02:00
theudas
fa46e0bef2
Added Parameter to NEL to take n sentences into account (#5548)
* added setting for neighbour sentence in NEL

* added spaCy contributor agreement

* added multi sentence also for training

* made the try-except block smaller
2020-06-12 02:03:23 +02:00
Jones Martins
28db7dd5d9
Add missing pronoums/determiners (#5569)
* Add missing pronoums/determiners

* Add test for missing pronoums

* Add contributor file
2020-06-10 18:47:04 +02:00
Martino Mensio
de00f967ce
adding spacy-universal-sentence-encoder (#5534)
* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
2020-06-08 20:26:30 +02:00
Hiroshi Matsuda
456bf47f51
fix a bug causing mis-alignments (#5560) 2020-06-08 15:49:34 +02:00
Leo
7d5a89661e
contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
Rajat
8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Jannis
aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00