Commit Graph

384 Commits

Author SHA1 Message Date
René Octavio Queiroz Dias
999ff03b19
fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911)
* docs: Add agreement

* bug: Regression test

Issue #6908

* fix: Changed from Dict to Iterable[str]

Fix #6908

* Update test to use make_tempdir

* fix: Fix WindowsPath error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-04 23:37:13 +01:00
Helio Machado
20a97cda38
Create 0x2b3bfa0.md (#6916) 2021-02-04 23:25:11 +01:00
Ines Montani
30765674d0 Merge branch 'master' into develop 2021-01-30 12:20:28 +11:00
Pamphile ROY
e496b8623f
SCA tupui 2021-01-29 15:46:53 +01:00
Ines Montani
230e651ad6 Merge branch 'develop' into master-tmp 2021-01-27 13:26:29 +11:00
Ines Montani
d5ef245bb1
Merge pull request #6822 from jganseman/master [ci skip] 2021-01-27 13:04:30 +11:00
jganseman
c9103d60fa
Create jganseman.md 2021-01-26 11:02:31 +01:00
Dhruv Naik
e7db07a0b9
Fix Span.char_span bug (#6816)
* Create dhruvrnaik.md

* add test for issue #6815

* bugfix for issue #6815

* update dhruvrnaik.md

* add span.vector test for #6815
2021-01-26 15:50:37 +08:00
muratjumashev
79327197d1 Add contributor agreement 2021-01-25 00:34:12 +06:00
KeshavG-lb
0a86d833d7
Spacy Cli info method causing backward compatibility issues (#6793)
* Spacy Cli info method causing backward compatibility issues #6791

fix backward compatibility by setting default value to exclude in info
method.

* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.

Reference : https://nikos7am.com/posts/mutable-default-arguments/
2021-01-23 11:21:43 +01:00
Luigi Coniglio
e83c818a78
DependencyMatcher improvements (fix #6678) (#6744)
* Adding contributor agreement for user werew

* [DependencyMatcher] Comment and clean code

* [DependencyMatcher] Use defaultdicts

* [DependencyMatcher] Simplify _retrieve_tree method

* [DependencyMatcher] Remove prepended underscores

* [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop

* [DependencyMatcher] Remove _nodes attribute

* [DependencyMatcher] Use enumerate in _retrieve_tree method

* [DependencyMatcher] Clean unused vars and use camel_case naming

* [DependencyMatcher] Memoize node+operator map

* Add root property to Token

* [DependencyMatcher] Groups matches by root

* [DependencyMatcher] Remove unused _keys_to_token attribute

* [DependencyMatcher] Use a list to map tokens to matcher's keys

* [DependencyMatcher] Remove recursion

* [DependencyMatcher] Use a generator to retrieve matches

* [DependencyMatcher] Remove unused memory pool

* [DependencyMatcher] Hide private methods and attributes

* [DependencyMatcher] Improvements to the matches validation

* Apply suggestions from code review

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* [DependencyMatcher] Fix keys_to_position_maps

* Remove Token.root property

* [DependencyMatcher] Remove functools' lru_cache

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-01-22 11:20:08 +11:00
Adriane Boyd
0c936004d1 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet (#6702)
* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords (#6310)
* Remove questionable French stopwords

* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) (#6345)
* Update stop_words.py

Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"

* Create cristianasp.md

* zero edit to push CI

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords (#6621)
* add contributor agreement

* update ro stopwords list

* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish (#6246)
* add syntax iterators for danish

* add test noun chunks for danish syntax iterators

* add contributor agreement

* update da syntax iterators to remove nested chunks

* add tests for da noun chunks

* Fix test

* add missing import
* fix example

* Prevent overlapping noun chunks

Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Bruno
1a77607036
spaCy v3 is not saving the best version in training loop (#6629)
* Save best only if is the best and also respect the average config

* Create bratao.md

* Update loop.py

* Remove average check

* Keep before_to_disk
2021-01-06 12:51:30 +11:00
Yosi
cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Adriane Boyd
724831b066 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
992723dfac
Add jabortell to the contributors (#6422)
* Add jabortell to the contributors

* Update jabortell.md

Added tick to applicable statement
2020-11-24 16:15:31 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language (#6248)
* Added Multext-East V5 tagset for Croatian language

* Create danielvasic.md

* Update danielvasic.md

* Update danielvasic.md

* Add tag map to CroatianDefaults

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking (#6302)
* add oprd to the list of accepted deps for noun chunking

* add SCA
2020-11-05 09:17:35 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Jan Margeta
ed1c37189a Add contributor agreement for jmargeta 2020-10-16 00:38:42 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
178760855f Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
Florijan Stamenković
18f5c309dc Fix Issue 6207 (#6208)
* Regression test for issue 6207

* Fix issue 6207

* Sign contributor agreement

* Minor adjustments to test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-09 10:14:40 +02:00
Šarūnas Navickas
287ba94a2f Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-09 10:14:40 +02:00
delzac
668507be1b Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-09 10:14:40 +02:00
Rahul Gupta
1a00bff06d
Hindi: Adds tests for lexical attributes (norm and like_num) (#5829)
* Hindi: Adds tests for lexical attributes (norm and like_num)

* Signs and sdds the contributor agreement

* Add ordinal numbers to be tagged as like_num

* Adds alternate pronunciation for 31 and 39
2020-10-07 10:23:32 +02:00
Nuccy90
c809b2c8e7
Update morph_rules.py (#6102)
* Update morph_rules.py

Added "dig" and "dej" ("you" in accusative form)

* Create Nuccy90.md

* Update Nuccy90.md
2020-10-06 15:14:47 +02:00
delzac
15ea401b39
Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-06 15:11:01 +02:00
Šarūnas Navickas
047fb9f8b8
Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-06 11:19:36 +02:00
Florijan Stamenković
9db670b996
Fix Issue 6207 (#6208)
* Regression test for issue 6207

* Fix issue 6207

* Sign contributor agreement

* Minor adjustments to test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-06 11:17:37 +02:00
Ines Montani
59deeb7da6 Merge branch 'develop' into master-tmp 2020-10-04 14:52:20 +02:00
Stanislav Schmidt
3589a64d44
Change type of texts argument in pipe to iterable (#6186)
* Change type of texts argument in pipe to iterable

* Add contributor agreement
2020-10-02 21:00:11 +02:00
Muhammad Fahmi Rasyid
7489d02dea
Update Indonesian Example Phrases (#6124)
* create contributor agreement

* Update Indonesian example. (see  #1107)

Update Indonesian examples with more proper phrases. the current phrases contains sensitive and violent words.
2020-09-23 14:02:26 +02:00
Ines Montani
864a697e63 Merge branch 'develop' into master-tmp 2020-09-04 13:15:36 +02:00
Juan Gutiérrez
9002bea29f
Update suffixes example (#5989)
* Update suffixes example

The current example will throw `TypeError: can only concatenate list (not "tuple") to list`

* Signing Contributor Agreement
2020-08-31 12:44:56 +02:00
Shashank
450720aca2
Added support for Sanskrit language (#5956)
* Added support for Sanskrit language

* Added tests for lexical attribute like_num
2020-08-25 10:56:29 +02:00
idoshr
b10c7bc56e
Hebrew like num (#5952)
* Update stop_words.py

Hebrew STOP WORDS

* Update stop_words.py

* contributor

* contributor

* add some common domain extentions
support human number 1K/1M....

* support human number 1K/1M....

* hebrew number tokenize
1K/1M implement in EN

* test human tokenize fix

* test

* heb like num
revert human number change

* heb like num
2020-08-24 14:30:05 +02:00
Attila Szász
669dc70822
Create tilusnet.md (#5914) 2020-08-12 22:46:08 +02:00