Commit Graph

506 Commits

Author SHA1 Message Date
jmyerston
993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Edward
8233359225
Fix preservation of spacy package meta (#8663)
* update package meta with existing_meta and nlp_meta

* Add spaCy contributor agreement

* Added more info when creating readme
2021-07-12 11:18:52 +02:00
Paul O'Leary McCann
1c70c87daf
Fix autoblack
The conditional needs double equals.
2021-07-10 16:02:39 +09:00
Paul O'Leary McCann
b8cdbb4bb6 Make the autoblack job not run on forks
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.

The way to make the job not run on forks is a little non-obvious but
based on this thread.

https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani
1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit 2021-07-09 23:09:24 +10:00
Sofie Van Landeghem
608fc1d623
avoid msg var impliciteness (#8619)
* avoid msg var impliciteness

* rename local msg

* Add CI tests for debug data and train

* Adjust debug data CLI test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 19:08:08 +02:00
Adriane Boyd
5fd0b5207e
Fix vectors check for sourced components (#8559)
* Fix vectors check for sourced components

Since vectors are not loaded when components are sourced, store a hash
for the vectors of each sourced component and compare it to the loaded
vectors after the vectors are loaded from the `[initialize]` block.

* Pop temporary info

* Remove stored hash in remove_pipe

* Add default for pop

* Add additional convert/debug/assemble CLI tests
2021-07-06 12:43:17 +02:00
Yoichiro Hasebe
e541092088
Create yohasebe.md 2021-07-04 08:57:04 +09:00
Ines Montani
c5c4e96597 Fix syntax [ci skip] 2021-07-02 17:46:56 +10:00
Ines Montani
6b905d67df Try workflow_dispatch and schedule [ci skip] 2021-07-02 17:45:27 +10:00
Ines Montani
70589e348e Commit as explosion-bot [ci skip] 2021-07-02 17:45:11 +10:00
Ines Montani
dd34a3a433 Try simpler approach [ci skip] 2021-07-02 17:40:49 +10:00
Ines Montani
2898331494 Improve logic [ci skip] 2021-07-02 17:37:35 +10:00
Ines Montani
519a9e29be Fix git login [ci skip] 2021-07-02 17:30:59 +10:00
Ines Montani
8961f36415 Commit manually in workflow [ci skip] 2021-07-02 17:27:48 +10:00
Ines Montani
2a5cbf1b0c Test different workflow trigger [ci skip] 2021-07-02 17:22:43 +10:00
Ines Montani
bbbaae0b5e Update triggers [ci skip] 2021-07-02 17:10:24 +10:00
Ines Montani
cdefb8cf1b Experimental: add autoblack.yml action [ci skip] 2021-07-02 17:07:05 +10:00
julien-talkair
6b1f9a5be0 add spacy contributor agreement 2021-07-01 17:41:12 +02:00
Ines Montani
88ad41316c
Update issue template [ci skip] 2021-06-28 03:11:37 +02:00
Ines Montani
db6361ab6e
Update issue template [ci skip] 2021-06-28 03:10:52 +02:00
Ines Montani
2e453bda92
Update issue links [ci skip] 2021-06-28 03:09:48 +02:00
Paul O'Leary McCann
0d3caa52a6 Update New Issue choices
This uses some new features related to Issue Templates to help direct
more people to Discussions.

1. Change the Discussions option to link to Discussions
2. Add a link to the FAQ
3. Disable blank issues
2021-06-27 14:41:33 +09:00
Adrian Zuber
f5aee0bbdf
Raise custom error in EntityLinker when KB is not set (#8442)
* Raise custom error in EntityLinker when KB is not set

* add contributor agreement

* Update E1018 error message
2021-06-25 23:04:00 +02:00
Adriane Boyd
172dfec4f2
Test download in CI with ca_core_news_sm (#8493) 2021-06-24 09:26:30 +02:00
Giovanni Toffoli
19521d525b
Added Italian POS-aware lemmatizer. (#8079)
* Added Italian POS-aware lemmatizer.

Also added the code used to build the lookup tables by POS.

* Create gtoffoli.md

* Add imports and format

* Remove helper script

* Use lemma_lookup instead of lemma_lookup_legacy

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-06-16 11:14:45 +02:00
Adriane Boyd
33240ed2c5 Temporarily skip model download test 2021-06-16 10:14:42 +02:00
Adriane Boyd
d52ab13b5f
Update CI: update ubuntu image, add download test (#8298)
* Update CI: update ubuntu image, add download test

* Switch instances to `ubuntu-18.04`
* Add model download test, currently only for one job with python 3.8

* Fix variable name

* Set variables explicitly
2021-06-07 14:46:07 +02:00
Vito De Tullio
3672464e25
applying suggestion to avoid mypy errors (#8265)
* applying suggestion to avoid mypy errors

* sign contributor agreement
2021-06-02 19:25:30 +10:00
Kristian Boda
dc8d8d15d2
Add hmrb to spaCy Universe (#8129)
* docs: add hmrb to spacy universe

* docs: add sentence on spacy versions

* docs: update description and images

* misc: add spaCy Contributor Agreement
2021-05-31 18:40:48 +10:00
Narayan Acharya
6b79714080
Address missing config overrides post load of models (#8208) 2021-05-31 18:36:52 +10:00
Julien Salinas
a176d2209a Sign contributors agreement. 2021-05-14 11:00:27 +02:00
Sevdimali
49aed683cc
Azerbaijani language added (#7911) 2021-04-28 14:42:02 +02:00
Adriane Boyd
f4080983ea
Extend to cupy 9.0.0 (#7914) 2021-04-28 10:18:24 +02:00
Janis Klaise
1690595e4d
Update load_lookups return type and docstring (#7907)
* Update load_lookups return type and docstring

* Add contributor agreement
2021-04-27 09:13:39 +02:00
Adriane Boyd
36ecba224e
Set up GPU CI testing (#7293)
* Set up CI for tests with GPU agent

* Update tests for enabled GPU

* Fix steps filename

* Add parallel build jobs as a setting

* Fix test requirements

* Fix install test requirements condition

* Fix pipeline models test

* Reset current ops in prefer/require testing

* Fix more tests

* Remove separate test_models test

* Fix regression 5551

* fix StaticVectors for GPU use

* fix vocab tests

* Fix regression test 5082

* Move azure steps to .github and reenable default pool jobs

* Consolidate/rename azure steps

Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-22 14:58:29 +02:00
meghanabhange
49ff1126bf
Project Idea : denomme | Multilingual Name Detection (#7845)
* Add denomme

* spaCy contributor agreement

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-04-22 08:48:17 +02:00
Pierre Lison
2f0ef2c9cc adding skweak to the SpaCy universe 2021-04-22 01:16:34 +02:00
Shantam Raj
6017fcf693
Default code for Setting Entity annotations on the website errors (#7738)
* the default example for "Setting entity annotations" errors on Binder

* updating contributer info

* using a new variable to store original entities
2021-04-21 09:16:32 +02:00
broaddeep
ee159b8543
Support match alignments (#7321)
* Support match alignments

* change naming from match_alignments to with_alignments, add conditional flow if with_alignments is given, validate with_alignments, add related test case

* remove added errors, utilize bint type, cleanup whitespace

* fix no new line in end of file

* Minor formatting

* Skip alignments processing if as_spans is set

* Add with_alignments to Matcher API docs

* Update website/docs/api/matcher.md

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-04-08 18:10:14 +10:00
Sam Edwardes
f6ad4684bd
Updates to universe.json for spaCyTextBlob (#7647)
* Updates to universe.json for spaCyTextBlob

Updated the documentation for spaCy 3.0.

* SamEdwardes.md

* Update SamEdwardes.md
2021-04-04 20:17:57 +02:00
Ayush Chaurasia
3c2ce41dd8
W&B integration: Optional support for dataset and model checkpoint logging and versioning (#7429)
* Add optional artifacts logging

* Update docs

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/training/loggers.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Bump WandbLogger Version

* Add documentation of v1 to legacy docs

* bump spacy-legacy to 3.0.2 (to be released)

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: svlandeg <sofie.vanlandeghem@gmail.com>
2021-04-01 19:36:23 +02:00
bsweileh
61472e7cb3
Update _training.md - Fix broken link on backpropagation (#7431)
* Update _training.md

Fix broken link on backpropagation

* Add agreement

add spacy contributor agreement
2021-03-15 09:21:35 +01:00
Ines Montani
37fc495f5d
Merge pull request #7353 from jankrepl/fix_entity_rules_labels 2021-03-09 15:09:24 +01:00
Ines Montani
4f32e3dedb Update issue templates [ci skip] 2021-03-10 01:08:05 +11:00
Jan Krepl
0e1d579f0c Add agreement 2021-03-09 10:57:32 +01:00
Boian Tzonev
cca8651fc8
Bulgarian tokenizer exceptions (#7114)
* [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian

* [Bulgarian] Add tokenizer exceptions and like_num for Bulgarian
2021-02-19 19:19:19 +01:00
Peter Baumann
61b04a70d5
Run PhraseMatcher on Spans (#6918)
* Add regression test

* Run PhraseMatcher on Spans

* Add test for PhraseMatcher on Spans and Docs

* Add SCA

* Add test with 3 matches in Doc, 1 match in Span

* Update docs

* Use doc.length for find_matches in tokenizer

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-10 23:43:32 +11:00
René Octavio Queiroz Dias
999ff03b19
fix: Fix textcat labels to expect a Optional[Iterable[str]] instead of Optional[Dict] (#6911)
* docs: Add agreement

* bug: Regression test

Issue #6908

* fix: Changed from Dict to Iterable[str]

Fix #6908

* Update test to use make_tempdir

* fix: Fix WindowsPath error

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-02-04 23:37:13 +01:00
Helio Machado
20a97cda38
Create 0x2b3bfa0.md (#6916) 2021-02-04 23:25:11 +01:00