Commit Graph

464 Commits

Author SHA1 Message Date
bsweileh
42fcff6f8a Update _training.md - Fix broken link on backpropagation (#7431)
* Update _training.md

Fix broken link on backpropagation

* Add agreement

add spacy contributor agreement
2021-03-15 09:24:12 +01:00
Ines Montani
37fc495f5d
Merge pull request #7353 from jankrepl/fix_entity_rules_labels 2021-03-09 15:09:24 +01:00
Ines Montani
4f32e3dedb Update issue templates [ci skip] 2021-03-10 01:08:05 +11:00
Jan Krepl
0e1d579f0c Add agreement 2021-03-09 10:57:32 +01:00
Boian Tzonev
cca8651fc8
Bulgarian tokenizer exceptions (#7114)
* [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian

* [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian
2021-02-19 19:19:19 +01:00
Peter Baumann
61b04a70d5
Run PhraseMatcher on Spans (#6918)
* Add regression test

* Run PhraseMatcher on Spans

* Add test for PhraseMatcher on Spans and Docs

* Add SCA

* Add test with 3 matches in Doc, 1 match in Span

* Update docs

* Use doc.length for find_matches in tokenizer

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-10 23:43:32 +11:00
René Octavio Queiroz Dias
999ff03b19
fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911)
* docs: Add agreement

* bug: Regression test

Issue #6908

* fix: Changed from Dict to Iterable[str]

Fix #6908

* Update test to use make_tempdir

* fix: Fix WindowsPath error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-04 23:37:13 +01:00
Helio Machado
20a97cda38
Create 0x2b3bfa0.md (#6916) 2021-02-04 23:25:11 +01:00
Ines Montani
30765674d0 Merge branch 'master' into develop 2021-01-30 12:20:28 +11:00
Pamphile ROY
e496b8623f
SCA tupui 2021-01-29 15:46:53 +01:00
Ines Montani
230e651ad6 Merge branch 'develop' into master-tmp 2021-01-27 13:26:29 +11:00
Ines Montani
d5ef245bb1
Merge pull request #6822 from jganseman/master [ci skip] 2021-01-27 13:04:30 +11:00
jganseman
c9103d60fa
Create jganseman.md 2021-01-26 11:02:31 +01:00
Dhruv Naik
e7db07a0b9
Fix Span.char_span bug (#6816)
* Create dhruvrnaik.md

* add test for issue #6815

* bugfix for issue #6815

* update dhruvrnaik.md

* add span.vector test for #6815
2021-01-26 15:50:37 +08:00
muratjumashev
79327197d1 Add contributor agreement 2021-01-25 00:34:12 +06:00
KeshavG-lb
0a86d833d7
Spacy Cli info method causing backward compatibility issues (#6793)
* Spacy Cli info method causing backward compatibility issues #6791

fix backward compatibility by setting default value to exclude in info
method.

* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.

Reference : https://nikos7am.com/posts/mutable-default-arguments/
2021-01-23 11:21:43 +01:00
Luigi Coniglio
e83c818a78
DependencyMatcher improvements (fix #6678) (#6744)
* Adding contributor agreement for user werew

* [DependencyMatcher] Comment and clean code

* [DependencyMatcher] Use defaultdicts

* [DependencyMatcher] Simplify _retrieve_tree method

* [DependencyMatcher] Remove prepended underscores

* [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop

* [DependencyMatcher] Remove _nodes attribute

* [DependencyMatcher] Use enumerate in _retrieve_tree method

* [DependencyMatcher] Clean unused vars and use camel_case naming

* [DependencyMatcher] Memoize node+operator map

* Add root property to Token

* [DependencyMatcher] Groups matches by root

* [DependencyMatcher] Remove unused _keys_to_token attribute

* [DependencyMatcher] Use a list to map tokens to matcher's keys

* [DependencyMatcher] Remove recursion

* [DependencyMatcher] Use a generator to retrieve matches

* [DependencyMatcher] Remove unused memory pool

* [DependencyMatcher] Hide private methods and attributes

* [DependencyMatcher] Improvements to the matches validation

* Apply suggestions from code review

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* [DependencyMatcher] Fix keys_to_position_maps

* Remove Token.root property

* [DependencyMatcher] Remove functools' lru_cache

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-01-22 11:20:08 +11:00
Adriane Boyd
0c936004d1 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet (#6702)
* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords (#6310)
* Remove questionable French stopwords

* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) (#6345)
* Update stop_words.py

Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"

* Create cristianasp.md

* zero edit to push CI

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords (#6621)
* add contributor agreement

* update ro stopwords list

* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish (#6246)
* add syntax iterators for danish

* add test noun chunks for danish syntax iterators

* add contributor agreement

* update da syntax iterators to remove nested chunks

* add tests for da noun chunks

* Fix test

* add missing import
* fix example

* Prevent overlapping noun chunks

Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Bruno
1a77607036
spaCy v3 is not saving the best version in training loop (#6629)
* Save best only if is the best and also respect the average config

* Create bratao.md

* Update loop.py

* Remove average check

* Keep before_to_disk
2021-01-06 12:51:30 +11:00
Yosi
cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns
ec876c9713 Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo 2020-12-14 22:03:58 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Ines Montani
85ca8c2bdd Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
Ines Montani
1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani
c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
svlandeg
4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
Adriane Boyd
724831b066 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
992723dfac
Add jabortell to the contributors (#6422)
* Add jabortell to the contributors

* Update jabortell.md

Added tick to applicable statement
2020-11-24 16:15:31 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language (#6248)
* Added Multext-East V5 tagset for Croatian language

* Create danielvasic.md

* Update danielvasic.md

* Update danielvasic.md

* Add tag map to CroatianDefaults

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking (#6302)
* add oprd to the list of accepted deps for noun chunking

* add SCA
2020-11-05 09:17:35 +01:00
Ines Montani
1e4d7e059f Revert "Test FUNDING.yml [ci skip]"
This reverts commit 287be48ad0.
2020-10-28 17:42:23 +01:00
Ines Montani
287be48ad0 Test FUNDING.yml [ci skip] 2020-10-28 17:36:25 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Jan Margeta
ed1c37189a Add contributor agreement for jmargeta 2020-10-16 00:38:42 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
178760855f Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
Florijan Stamenković
18f5c309dc Fix Issue 6207 (#6208)
* Regression test for issue 6207

* Fix issue 6207

* Sign contributor agreement

* Minor adjustments to test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-09 10:14:40 +02:00
Šarūnas Navickas
287ba94a2f Website (Universe): An entry for rita-dsl (#6138)
* Create zaibacu.md

* Add RITA-DSL entry

* Update agreement

* Fix formatting
2020-10-09 10:14:40 +02:00
delzac
668507be1b Reflect on usage doc that IS_SENT_START attribute exist (#6114)
* Reflect on usage doc that IS_SENT_START attribute exist

* Create delzac.md
2020-10-09 10:14:40 +02:00