Commit Graph

566 Commits

Author SHA1 Message Date
Baltazar
4d85cb88a5 added contribution license 2021-08-19 21:45:18 +02:00
Steele Farnsworth
b18cb1cd2a
Refactor dependencymatcher.pyx to use list comps and enumerate. (#8956)
* Refactor to use list comps and enumerate.

Replace loops that append to a list with a list comprehensions where this does not change the behavior; replace range(len(...)) loops with enumerate. Correct one typo in a comment. Replace a call to set() with a set literal.

* Undo double assignment.

Expand `tokens_to_key[j] = k = self._get_matcher_key(key, i, j)` to two statements.

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Sign contributors agreement

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-08-18 09:55:45 +02:00
Lasse
195e4e48c3 add textdescriptives to universe 2021-08-13 14:35:18 +02:00
fgaim
ee011ca963
Update Tigrinya ትግርኛ language support (#8900)
* Add missing punctuation for Tigrinya and Amharic

* Fix numeral and ordinal numbers for Tigrinya

 - Amharic was used in many cases
 - Also fixed some typos

* Update Tigrinya stop-words

* Contributor agreement for fgaim

* Fix typo in "ti" lang test

* Remove multi-word entries from numbers and ordinals
2021-08-10 13:55:08 +02:00
Dimitar Ganev
733ffe439d
Improve the stop words and the tokenizer exceptions in Bulgarian language. (#8862)
* Add more stop words and Improve the readability

* Add and categorize the tokenizer exceptions for `bg` lang

* Create syrull.md

* Add references for the additional stop words and tokenizer exc abbrs
2021-08-10 13:44:23 +02:00
Eduard Zorita
439f30faad
Add stub files for main cython classes (#8427)
* Add stub files for main API classes

* Add contributor agreement for ezorita

* Update types for ndarray and hash()

* Fix __getitem__ and __iter__

* Add attributes of Doc and Token classes

* Overload type hints for Span.__getitem__

* Fix type hint overload for Span.__getitem__

Co-authored-by: Luca Dorigo <dorigoluca@gmail.com>
2021-08-07 12:30:03 +02:00
Nick Sorros
0485cdefcc
Add logger debug for project push and pull (#8860)
* Add logger debug for project push and pull

* Sign contributor agreement
2021-08-02 18:13:53 +02:00
Ines Montani
51e5903d6f
Merge pull request #8702 from KennethEnevoldsen/master [ci skip] 2021-07-18 13:18:42 +10:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
jmyerston
993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
KennethEnevoldsen
e5127992a0 added agreement 2021-07-13 10:11:02 +02:00
Edward
8233359225
Fix preservation of spacy package meta (#8663)
* update package meta with existing_meta and nlp_meta

* Add spaCy contributor agreement

* Added more info when creating readme
2021-07-12 11:18:52 +02:00
Paul O'Leary McCann
1c70c87daf
Fix autoblack
The conditional needs double equals.
2021-07-10 16:02:39 +09:00
Paul O'Leary McCann
b8cdbb4bb6 Make the autoblack job not run on forks
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.

The way to make the job not run on forks is a little non-obvious but
based on this thread.

https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani
1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit 2021-07-09 23:09:24 +10:00
Sofie Van Landeghem
608fc1d623
avoid msg var impliciteness (#8619)
* avoid msg var impliciteness

* rename local msg

* Add CI tests for debug data and train

* Adjust debug data CLI test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 19:08:08 +02:00
Adriane Boyd
5fd0b5207e
Fix vectors check for sourced components (#8559)
* Fix vectors check for sourced components

Since vectors are not loaded when components are sourced, store a hash
for the vectors of each sourced component and compare it to the loaded
vectors after the vectors are loaded from the `[initialize]` block.

* Pop temporary info

* Remove stored hash in remove_pipe

* Add default for pop

* Add additional convert/debug/assemble CLI tests
2021-07-06 12:43:17 +02:00
Yoichiro Hasebe
e541092088
Create yohasebe.md 2021-07-04 08:57:04 +09:00
Ines Montani
c5c4e96597 Fix syntax [ci skip] 2021-07-02 17:46:56 +10:00
Ines Montani
6b905d67df Try workflow_dispatch and schedule [ci skip] 2021-07-02 17:45:27 +10:00
Ines Montani
70589e348e Commit as explosion-bot [ci skip] 2021-07-02 17:45:11 +10:00
Ines Montani
dd34a3a433 Try simpler approach [ci skip] 2021-07-02 17:40:49 +10:00
Ines Montani
2898331494 Improve logic [ci skip] 2021-07-02 17:37:35 +10:00
Ines Montani
519a9e29be Fix git login [ci skip] 2021-07-02 17:30:59 +10:00
Ines Montani
8961f36415 Commit manually in workflow [ci skip] 2021-07-02 17:27:48 +10:00
Ines Montani
2a5cbf1b0c Test different workflow trigger [ci skip] 2021-07-02 17:22:43 +10:00
Ines Montani
bbbaae0b5e Update triggers [ci skip] 2021-07-02 17:10:24 +10:00
Ines Montani
cdefb8cf1b Experimental: add autoblack.yml action [ci skip] 2021-07-02 17:07:05 +10:00
julien-talkair
6b1f9a5be0 add spacy contributor agreement 2021-07-01 17:41:12 +02:00
Ines Montani
88ad41316c
Update issue template [ci skip] 2021-06-28 03:11:37 +02:00
Ines Montani
db6361ab6e
Update issue template [ci skip] 2021-06-28 03:10:52 +02:00
Ines Montani
2e453bda92
Update issue links [ci skip] 2021-06-28 03:09:48 +02:00
Paul O'Leary McCann
0d3caa52a6 Update New Issue choices
This uses some new features related to Issue Templates to help direct
more people to Discussions.

1. Change the Discussions option to link to Discussions
2. Add a link to the FAQ
3. Disable blank issues
2021-06-27 14:41:33 +09:00
Adrian Zuber
f5aee0bbdf
Raise custom error in EntityLinker when KB is not set (#8442)
* Raise custom error in EntityLinker when KB is not set

* add contributor agreement

* Update E1018 error message
2021-06-25 23:04:00 +02:00
Adriane Boyd
172dfec4f2
Test download in CI with ca_core_news_sm (#8493) 2021-06-24 09:26:30 +02:00
Giovanni Toffoli
19521d525b
Added Italian POS-aware lemmatizer. (#8079)
* Added Italian POS-aware lemmatizer.

Also added the code used to build the lookup tables by POS.

* Create gtoffoli.md

* Add imports and format

* Remove helper script

* Use lemma_lookup instead of lemma_lookup_legacy

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-06-16 11:14:45 +02:00
Adriane Boyd
33240ed2c5 Temporarily skip model download test 2021-06-16 10:14:42 +02:00
Adriane Boyd
d52ab13b5f
Update CI: update ubuntu image, add download test (#8298)
* Update CI: update ubuntu image, add download test

* Switch instances to `ubuntu-18.04`
* Add model download test, currently only for one job with python 3.8

* Fix variable name

* Set variables explicitly
2021-06-07 14:46:07 +02:00
Vito De Tullio
3672464e25
applying suggestion to avoid mypy errors (#8265)
* applying suggestion to avoid mypy errors

* sign contributor agreement
2021-06-02 19:25:30 +10:00
Kristian Boda
dc8d8d15d2
Add hmrb to spaCy Universe (#8129)
* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
2021-05-31 18:40:48 +10:00
Narayan Acharya
6b79714080
Address missing config overrides post load of models (#8208) 2021-05-31 18:36:52 +10:00
Julien Salinas
a176d2209a Sign contributors agreement. 2021-05-14 11:00:27 +02:00
Sevdimali
49aed683cc
Azerbaijani language added (#7911) 2021-04-28 14:42:02 +02:00
Adriane Boyd
f4080983ea
Extend to cupy 9.0.0 (#7914) 2021-04-28 10:18:24 +02:00
Janis Klaise
1690595e4d
Update load_lookups return type and docstring (#7907)
* Update load_lookups return type and docstring

* Add contributor agreement
2021-04-27 09:13:39 +02:00
Adriane Boyd
36ecba224e
Set up GPU CI testing (#7293)
* Set up CI for tests with GPU agent

* Update tests for enabled GPU

* Fix steps filename

* Add parallel build jobs as a setting

* Fix test requirements

* Fix install test requirements condition

* Fix pipeline models test

* Reset current ops in prefer/require testing

* Fix more tests

* Remove separate test_models test

* Fix regression 5551

* fix StaticVectors for GPU use

* fix vocab tests

* Fix regression test 5082

* Move azure steps to .github and reenable default pool jobs

* Consolidate/rename azure steps

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-22 14:58:29 +02:00
meghanabhange
49ff1126bf
Project Idea : denomme | Multilingual Name Detection (#7845)
* Add denomme

* spaCy contributor agreement

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-04-22 08:48:17 +02:00
Pierre Lison
2f0ef2c9cc adding skweak to the SpaCy universe 2021-04-22 01:16:34 +02:00
Shantam Raj
6017fcf693
Default code for Setting Entity annotations on the website errors (#7738)
* the default example for "Setting entity annotations" errors on Binder

* updating contributer info

* using a new variable to store original entities
2021-04-21 09:16:32 +02:00
broaddeep
ee159b8543
Support match alignments (#7321)
* Support match alignments

* change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case

* remove added errors, utilize bint type, cleanup whitespace

* fix no new line in end of file

* Minor formatting

* Skip alignments processing if as_spans is set

* Add with_alignments to Matcher API docs

* Update website/docs/api/matcher.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-04-08 18:10:14 +10:00
Sam Edwardes
f6ad4684bd
Updates to universe.json for spaCyTextBlob (#7647)
* Updates to universe.json for spaCyTextBlob

Updated the documentation for spaCy 3.0.

* SamEdwardes.md

* Update SamEdwardes.md
2021-04-04 20:17:57 +02:00
Ayush Chaurasia
3c2ce41dd8
W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429)
* Add optional artifacts logging

* Update docs

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Bump WandbLogger Version

* Add documentation of v1 to legacy docs

* bump spacy-legacy to 3.0.2 (to be released)

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-01 19:36:23 +02:00
bsweileh
61472e7cb3
Update _training.md - Fix broken link on backpropagation (#7431)
* Update _training.md

Fix broken link on backpropagation

* Add agreement

add spacy contributor agreement
2021-03-15 09:21:35 +01:00
Ines Montani
37fc495f5d
Merge pull request #7353 from jankrepl/fix_entity_rules_labels 2021-03-09 15:09:24 +01:00
Ines Montani
4f32e3dedb Update issue templates [ci skip] 2021-03-10 01:08:05 +11:00
Jan Krepl
0e1d579f0c Add agreement 2021-03-09 10:57:32 +01:00
Boian Tzonev
cca8651fc8
Bulgarian tokenizer exceptions (#7114)
* [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian

* [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian
2021-02-19 19:19:19 +01:00
Peter Baumann
61b04a70d5
Run PhraseMatcher on Spans (#6918)
* Add regression test

* Run PhraseMatcher on Spans

* Add test for PhraseMatcher on Spans and Docs

* Add SCA

* Add test with 3 matches in Doc, 1 match in Span

* Update docs

* Use doc.length for find_matches in tokenizer

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-10 23:43:32 +11:00
René Octavio Queiroz Dias
999ff03b19
fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911)
* docs: Add agreement

* bug: Regression test

Issue #6908

* fix: Changed from Dict to Iterable[str]

Fix #6908

* Update test to use make_tempdir

* fix: Fix WindowsPath error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-04 23:37:13 +01:00
Helio Machado
20a97cda38
Create 0x2b3bfa0.md (#6916) 2021-02-04 23:25:11 +01:00
Ines Montani
30765674d0 Merge branch 'master' into develop 2021-01-30 12:20:28 +11:00
Pamphile ROY
e496b8623f
SCA tupui 2021-01-29 15:46:53 +01:00
Ines Montani
230e651ad6 Merge branch 'develop' into master-tmp 2021-01-27 13:26:29 +11:00
Ines Montani
d5ef245bb1
Merge pull request #6822 from jganseman/master [ci skip] 2021-01-27 13:04:30 +11:00
jganseman
c9103d60fa
Create jganseman.md 2021-01-26 11:02:31 +01:00
Dhruv Naik
e7db07a0b9
Fix Span.char_span bug (#6816)
* Create dhruvrnaik.md

* add test for issue #6815

* bugfix for issue #6815

* update dhruvrnaik.md

* add span.vector test for #6815
2021-01-26 15:50:37 +08:00
muratjumashev
79327197d1 Add contributor agreement 2021-01-25 00:34:12 +06:00
KeshavG-lb
0a86d833d7
Spacy Cli info method causing backward compatibility issues (#6793)
* Spacy Cli info method causing backward compatibility issues #6791

fix backward compatibility by setting default value to exclude in info
method.

* setting empty list as default argument is dangerous.
so setting default to None and then setting it to emptylist, if None.

Reference : https://nikos7am.com/posts/mutable-default-arguments/
2021-01-23 11:21:43 +01:00
Luigi Coniglio
e83c818a78
DependencyMatcher improvements (fix #6678) (#6744)
* Adding contributor agreement for user werew

* [DependencyMatcher] Comment and clean code

* [DependencyMatcher] Use defaultdicts

* [DependencyMatcher] Simplify _retrieve_tree method

* [DependencyMatcher] Remove prepended underscores

* [DependencyMatcher] Address TODO and move grouping of token's positions out of the loop

* [DependencyMatcher] Remove _nodes attribute

* [DependencyMatcher] Use enumerate in _retrieve_tree method

* [DependencyMatcher] Clean unused vars and use camel_case naming

* [DependencyMatcher] Memoize node+operator map

* Add root property to Token

* [DependencyMatcher] Groups matches by root

* [DependencyMatcher] Remove unused _keys_to_token attribute

* [DependencyMatcher] Use a list to map tokens to matcher's keys

* [DependencyMatcher] Remove recursion

* [DependencyMatcher] Use a generator to retrieve matches

* [DependencyMatcher] Remove unused memory pool

* [DependencyMatcher] Hide private methods and attributes

* [DependencyMatcher] Improvements to the matches validation

* Apply suggestions from code review

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>

* [DependencyMatcher] Fix keys_to_position_maps

* Remove Token.root property

* [DependencyMatcher] Remove functools' lru_cache

Co-authored-by: Matthew Honnibal <honnibal+gh@gmail.com>
2021-01-22 11:20:08 +11:00
Adriane Boyd
0c936004d1 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-rc3 2021-01-14 11:49:58 +01:00
Antonio Miras
b4bd8f347a
spaCy Universe: New project; SpacyDotNet (#6702)
* Universe: SpacyDotNet a .NET Core spaCy wrapper

* Signed contributor agreement

Co-authored-by: Antonio Miras <antonio@amiras.net>
2021-01-13 12:47:30 +11:00
Alex Combessie
9cc880014c
Remove questionable French stopwords (#6310)
* Remove questionable French stopwords

* Create alexcombessie.md
2021-01-08 11:36:22 +11:00
Cristiana S Parada
7a0222f260
Update stop_words.py in Portuguese (a,o,e) (#6345)
* Update stop_words.py

Added three aditional stopwords: "a" and "o" that means "the", and "e" that means "and"

* Create cristianasp.md

* zero edit to push CI

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-01-08 11:35:38 +11:00
Lorena Ciutacu
f11002f1f1
add new Romanian stopwords (#6621)
* add contributor agreement

* update ro stopwords list

* add new stopwords
2021-01-08 11:34:47 +11:00
ophelielacroix
e3222fdec9
Add (noun chunks) syntax iterators for Danish (#6246)
* add syntax iterators for danish

* add test noun chunks for danish syntax iterators

* add contributor agreement

* update da syntax iterators to remove nested chunks

* add tests for da noun chunks

* Fix test

* add missing import
* fix example

* Prevent overlapping noun chunks

Prevent overlapping noun chunks by tracking the end index of the
previous noun chunk span.

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-01-07 16:33:00 +11:00
Bruno
1a77607036
spaCy v3 is not saving the best version in training loop (#6629)
* Save best only if is the best and also respect the average config

* Create bratao.md

* Update loop.py

* Remove average check

* Keep before_to_disk
2021-01-06 12:51:30 +11:00
Yosi
cf52510631
Add Amharic አማርኛ Language support (#6583)
* Add Amharic to space

* clean up

* Add some PRON_LEMMA

* add Tigrinya support

* remove text_noun_chunks

* Tigrinya Support

* added some more details for ti

* fix unit test

* add amharic char range

* changes from review

* amharic and tigrinya share same unicode block

* get rid of _amharic/_tigrinya in char_classes

Co-authored-by: Josiah Solomon <jsolomon@meteorcomm.com>
2020-12-22 16:50:34 +01:00
Ines Montani
d8aa113d16
Merge pull request #6566 from rafguns/cite-zenodo [ci skip] 2020-12-16 16:40:50 +11:00
Thomas Bird
f6e4378942
Add SCA for @thomasbird (#6576) 2020-12-15 20:59:47 +01:00
Raf Guns
ec876c9713 Merge branch 'master' of https://github.com/explosion/spaCy into cite-zenodo 2020-12-14 22:03:58 +01:00
Raf Guns
a90ca0e1fb Add contributor agreement 2020-12-14 22:01:14 +01:00
Ines Montani
85ca8c2bdd Merge branch 'master' into develop 2020-12-11 13:44:41 +11:00
Ines Montani
1d4b1dea25 Update contributing guide and issue template [ci skip] 2020-12-11 13:39:26 +11:00
Ines Montani
c9b67b02f8 Update issue templates 2020-12-11 10:05:47 +11:00
svlandeg
4afcd9567e refer to GH discussions 2020-12-10 20:56:12 +01:00
Adriane Boyd
724831b066 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master
* Update Macedonian for v3
* Update Turkish for v3
2020-11-25 11:49:34 +01:00
Jacob Bortell
992723dfac
Add jabortell to the contributors (#6422)
* Add jabortell to the contributors

* Update jabortell.md

Added tick to applicable statement
2020-11-24 16:15:31 +01:00
Yusuke Mori
e3ac90b035
Avoid a SyntaxError in self-attentive-parser (#6428)
* Avoid a SyntaxError in self-attentive-parser

Fix a usage of quotation marks in the example of spaCy Universe self-attentive-parser

* Create forest1988.md

Fill in the spaCy contributor agreement
2020-11-22 21:59:37 +01:00
M. Revuelta Espinosa
51232ffb9e
Update universe.json (include PatternOmatic) (#6399)
Request to include PatternOmatic in spaCy Universe

Adds @revuel to contributors
2020-11-19 13:15:50 +01:00
Daniel Vasic
20d72de986
Added Multext-East V5 tagset for Croatian language (#6248)
* Added Multext-East V5 tagset for Croatian language

* Create danielvasic.md

* Update danielvasic.md

* Update danielvasic.md

* Add tag map to CroatianDefaults

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-11-05 12:19:22 +01:00
Vu Ha
6d465ec52c
add oprd to the list of accepted deps for noun chunking (#6302)
* add oprd to the list of accepted deps for noun chunking

* add SCA
2020-11-05 09:17:35 +01:00
Ines Montani
1e4d7e059f Revert "Test FUNDING.yml [ci skip]"
This reverts commit 287be48ad0.
2020-10-28 17:42:23 +01:00
Ines Montani
287be48ad0 Test FUNDING.yml [ci skip] 2020-10-28 17:36:25 +01:00
Robert Šípek
260c29794a
Fill contributor agreement by robertsipek (#6285)
* Fill contributor agreement by robertsipek

* Fill contributor agreement by robertsipek
2020-10-22 22:13:17 +02:00
Kunal Sharma
01aec7a313
Adding MindMeld to Universe JSON (#6275)
* Adding Mindmeld to Universe JSON

Mindmeld is a conversational AI platform for deep-domain voice interfaces and chatbots. https://www.mindmeld.com/

* Signing contribution agreement.

Co-authored-by: kunshar2 <kunshar2@cisco.com>
2020-10-21 18:42:11 +02:00
walterhenry
ff82644746 User contributor agreement
Here it is!
2020-10-19 16:25:09 +02:00
Jan Margeta
ed1c37189a Add contributor agreement for jmargeta 2020-10-16 00:38:42 +02:00
Borijan Georgievski
2311192ba1
Include Macedonian language (#6230)
* Include Macedonian language

* Fix indentation at char_classes.py

* Fix indentation at char_classes.py

* Add Macedonian tests, update lex_attrs and char_classes

* Import unicode literals for python 2
2020-10-15 15:55:01 +02:00
Ines Montani
178760855f Merge branch 'develop' into master-tmp 2020-10-15 09:06:03 +02:00
Florijan Stamenković
18f5c309dc Fix Issue 6207 (#6208)
* Regression test for issue 6207

* Fix issue 6207

* Sign contributor agreement

* Minor adjustments to test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2020-10-09 10:14:40 +02:00