Commit Graph

339 Commits

Author SHA1 Message Date
Ines Montani
864a697e63 Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example (#5989)
* Update suffixes example

The current example will throw `TypeError: can only concatenate list (not "tuple") to list`

* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Shashank
450720aca2
Added support for Sanskrit language (#5956)
* Added support for Sanskrit language

* Added tests for lexical attribute like_num
2020-08-25 10:56:29 +02:00
idoshr
b10c7bc56e
Hebrew like num (#5952)
* Update stop_words.py

Hebrew STOP WORDS

* Update stop_words.py

* contributor

* contributor

* add some common domain extentions
support human number 1K/1M....

* support human number 1K/1M....

* hebrew number tokenize
1K/1M implement in EN

* test human tokenize fix

* test

* heb like num
revert human number change

* heb like num
2020-08-24 14:30:05 +02:00
Attila Szász
669dc70822
Create tilusnet.md (#5914) 2020-08-12 22:46:08 +02:00
Adam Bittlingmayer
7b33b2854f
Add Armenian sentence-final verchaket, Greek question mark and Arabic question mark to default punct (#5910)
* Add Armenian sentence-final verchaket

* Add Greek and Arabic question marks, and contributor agreement

* Check box
2020-08-12 15:36:14 +02:00
graue70
49e690bde1
Fix typos in comments (#5904)
* Fix typo in comment

* Fix typo

* Add spaCy Contributor Agreement
2020-08-12 15:35:25 +02:00
holubvl3
d16c0f2c3a
Create holubvl3 (#5845)
* Create holubvl3

* Rename holubvl3 to holubvl3.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-07-30 17:40:31 +02:00
Gustavo Zadrozny Leyendecker
90b958fd01
Fix on EntityRendered to support break lines (after last entity) (closes #5838) 2020-07-29 18:48:39 +02:00
Li Zhe
a69eb445dc
fix the wrong hash url in adding-languages.md file (#5810)
* fix the wrong hash url in adding-languages.md file

change the #101 url hash path to #language-data

* filled in the spaCy Contributor Agreement 

filled in the spaCy Contributor Agreement
2020-07-25 13:13:38 +02:00
Joshua Olson
6d4d5c074c
Mark Japanese documents as tagged. (#5803)
Mark the document as tagged before returning it to the user from the JapaneseTokenizer.
Fixes #5802
2020-07-23 08:57:01 +02:00
Ines Montani
644074b954 Merge branch 'develop' into master-tmp 2020-07-20 14:58:04 +02:00
Alec Chapman
a8978ca285
Add VA COVID-19 NLP project to spaCy Universe (#5777)
* Update universe.json

Add cov-bsv to "resources"

* Update universe.json

* add contributor agreement
2020-07-19 13:35:31 +02:00
gandersen101
9097549227
Adding spaczz package to universe.json (#5717)
* Adding spaczz package to universe.json

* Adding contributor agreement.
2020-07-07 20:55:24 +02:00
Jonathan Besomi
546f3d10d4
Add texthero to universe.json (#5716)
* Add texthero to universe.json

* Add spaCy contributor Agreement
2020-07-07 20:54:22 +02:00
Mike Izbicki
7a2ca00794
fix bug in Korean language, resulting in 100x speedup by reducing overhead of mecab (#5701)
* speed up Korean nlp 100x by stopping mecab from reloading on each doc

* add contributor agreement

* rename variables to improve code readability
2020-07-06 17:03:33 +02:00
Sebastián Ramírez
b985cc4025 📄 Add spaCy Contributor Agreement 2020-07-01 20:57:21 +02:00
Ines Montani
414dc7ace1 Merge branch 'spacy.io' into spacy.io-develop 2020-07-01 11:47:47 +02:00
Matthias Hertel
305221f3e5 Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:55 +02:00
Matthias Hertel
8b0f749606
Website: fixed the token span in the text about the rule-based matching example (#5669)
* fixed token span in pattern matcher example

* contributor agreement
2020-06-30 19:58:23 +02:00
PluieElectrique
90c7eb0e2f
Reduce memory usage of Lookup's BloomFilter (#5606)
* Reduce memory usage of Lookup's BloomFilter

* Remove extra Table update
2020-06-26 14:09:10 +02:00
Richard Liaw
0ef78bad93
contribute (#5632) 2020-06-23 08:53:58 +02:00
Rameshh
c34420794a
Add Nepali Language (#5622)
* added support for nepali lang

* added examples and test files

* added spacy contributor agreement
2020-06-22 10:25:46 +02:00
Karen Hambardzumyan
ff6a084e9c
Create mahnerak.md (#5615) 2020-06-20 11:14:26 +02:00
Marat M. Yavrumyan
ccd7edf04b
Create myavrum.md (#5612) 2020-06-19 18:34:27 +02:00
Arvind Srinivasan
aa5b40fa64
Added Tamil Example Sentences (#5583)
* Added Examples for Tamil Sentences

#### Description
This PR add example sentences for the Tamil language which were missing as per issue #1107 

#### Type of Change
This is an enhancement.

* Accepting spaCy Contributor Agreement

* Signed on my behalf as an individual
2020-06-13 15:56:26 +02:00
theudas
fa46e0bef2
Added Parameter to NEL to take n sentences into account (#5548)
* added setting for neighbour sentence in NEL

* added spaCy contributor agreement

* added multi sentence also for training

* made the try-except block smaller
2020-06-12 02:03:23 +02:00
Jones Martins
28db7dd5d9
Add missing pronoums/determiners (#5569)
* Add missing pronoums/determiners

* Add test for missing pronoums

* Add contributor file
2020-06-10 18:47:04 +02:00
Martino Mensio
de00f967ce
adding spacy-universal-sentence-encoder (#5534)
* adding spacy-universal-sentence-encoder

* update affiliation

* updated code example
2020-06-08 20:26:30 +02:00
Hiroshi Matsuda
456bf47f51
fix a bug causing mis-alignments (#5560) 2020-06-08 15:49:34 +02:00
Leo
7d5a89661e
contributor agreement signed (#5525) 2020-05-31 20:13:39 +02:00
Rajat
8b8efa1b42
update spacy universe with my project (#5497)
* added contextualSpellCheck in spacy universe meta

* removed extra formatting by code

* updated with permanent links

* run json linter used by spacy

* filled SCA

* updated the description
2020-05-25 11:30:23 +02:00
Jannis
aa53ce6996
Documentation Typo Fix (#5492)
* Fix typo

Change 'realize' to 'realise'

* Add contributer agreement
2020-05-22 19:50:26 +02:00
Matthew Honnibal
93c4d13588
Merge pull request #5264 from lfiedler/issue-5230
Fix ResourceWarnings during unittest
2020-05-22 00:31:07 +02:00
Kevin Lu
9a1a535215
Create kevinlu1248.md 2020-05-19 20:25:45 -07:00
Ines Montani
a41e28ceba
Merge pull request #5436 from ilivans/fix_errors_with_codes 2020-05-18 10:45:56 +02:00
Ilkyu Ju
72a25c9cef
Very minor issues in Korean example sentences (#5446)
* Add contributor agreement

* Improve ko translation of example sentences

I fixed unnatural translations and word spacing errors.

* Update osori.md
2020-05-17 13:43:34 +02:00
Ilia Ivanov
ee8fe37474 Add ilivans' contributor agreement 2020-05-14 15:59:06 +02:00
Vishnu Priya VR
9ce059dd06
Limiting noun_chunks for specific languages (#5396)
* Limiting noun_chunks for specific langauges

* Limiting noun_chunks for specific languages

Contributor Agreement

* Addressing review comments

* Removed unused fixtures and imports

* Add fa_tokenizer in test suite

* Use fa_tokenizer in test

* Undo extraneous reformatting

Co-authored-by: adrianeboyd <adrianeboyd@gmail.com>
2020-05-14 12:58:06 +02:00
Travis Hoppe
d4cc18b746
Added author information for NLPre (#5414)
* Add author links for NLPre and update category

* Add contributor statement
2020-05-08 11:28:54 +02:00
Samuel Rodríguez Medina
8602daba85
Swedish like_num (#5371)
* Sign contributor agreement.

* Add like_num functionality to Swedish.

* Update spacy/tests/lang/sv/test_lex_attrs.py

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update contributor agreement

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-29 21:25:22 +02:00
adrianeboyd
a6e521cd79
Add is_sent_end token property (#5375)
Reconstruction of the original PR #4697 by @MiniLau.

Removes unused `SENT_END` symbol and `IS_SENT_END` from `Matcher` schema
because the Matcher is only going to be able to support `IS_SENT_START`.
2020-04-29 12:53:16 +02:00
Louis Guitton
a27c4014f5
Add mlflow to spaCy universe (#5352)
* Add mlflow to universe

* Use mlflow black logo
2020-04-29 10:18:03 +02:00
Michael
5b5528ff2e
Add !=3.4.* to python_requires (#5344)
Missed in 80d554f2e2
2020-04-27 22:02:09 +02:00
Punitvara
b2b7e1f37a
This PR adds Gujarati Language class along with (#5355)
* This PR adds Gujarati Language class along with
- stop words

* Add test for gu tokenizer
2020-04-27 11:07:37 +02:00
sabiqueqb
fc91660aa2
Gh 5339 language class for malayalam (#5342)
* Initialize Malayalam Language class

* Add lex_attrs and examples for Malayalam

* Add spaCy Contributor Agreement

* Add test for ml tokenizer
2020-04-27 09:45:08 +02:00
Mike
481574cbc8
[minor doc change] embedding vis. link is broken in website/docs/usage/examples.md (#5325)
* The embedding vis. link is broken

The first link seems to be reasonable for now unless someone has an updated embedding vis they want to share?

* contributor agreement

* Update Mlawrence95.md

* Update website/docs/usage/examples.md

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2020-04-21 20:35:12 +02:00
laszabine
fb73d4943a
Amend documentation to Language.evaluate (#5319)
* Specified usage of arguments to Language.evaluate

* Created contributor agreement
2020-04-16 20:00:18 +02:00
Jakob Jul Elben
663333c3b2
Fixes #5413 (#5315)
* Fix 5314

* Add contributor

* Resolve requested changes

Co-authored-by: Jakob Jul Elben <jakob@datamaga.com>
2020-04-16 13:29:02 +02:00
Sébastien Harinck
dac70f29eb
contrib: add contributor agreement for user sebastienharinck (#5316) 2020-04-16 11:32:09 +02:00