Commit Graph

14803 Commits

Author SHA1 Message Date
Ines Montani
483f3175cb Tidy up [ci skip] 2021-07-17 13:43:15 +10:00
Ines Montani
15e6578f7d
Adjust formatting 2021-07-17 10:49:13 +10:00
Mario Šaško
1ba2e8a646
Add TakeLab/spacy-udpipe to Universe (#8698)
* Add TakeLab/spacy-udpipe to universe

* Add SCA

* Sign SCA
2021-07-16 11:15:52 +02:00
explosion-bot
eff3d1088b Auto-format code with black 2021-07-16 08:03:36 +00:00
Adriane Boyd
f5acc48111
Remove TrainablePipe as base class for Lemmatizer in API docs (#8725) 2021-07-15 16:41:36 +02:00
Adriane Boyd
ac45c7c045
Add pre-commit to ignored requirements (#8728) 2021-07-15 16:41:15 +02:00
jmyerston
993b0fab0e
Added ancient Greek language support (#8606)
* Add ancient Greek language support

Initial commit

* Contributor Agreement

* grc tokenizer test added  and files formatted with black, unnecessary import removed

Co-Authored-By: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Commas in lists fixed. __init__py added to test

* Update lex_attrs.py

* Update stop_words.py

* Update stop_words.py

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-15 10:27:17 +02:00
Sofie Van Landeghem
77859beb99
spacy.ngram_range_suggester.v1 (#8699) 2021-07-15 10:01:22 +02:00
Julien Rossi
e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00
KennethEnevoldsen
e5127992a0 added agreement 2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen
94ce904e10
added missing comma 2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen
a81fcc81b0
added dacy to universe 2021-07-13 09:54:08 +02:00
Adriane Boyd
f9fd2889b7
Use 0-vector for OOV lexemes (#8639) 2021-07-13 14:48:12 +10:00
Edward
8233359225
Fix preservation of spacy package meta (#8663)
* update package meta with existing_meta and nlp_meta

* Add spaCy contributor agreement

* Added more info when creating readme
2021-07-12 11:18:52 +02:00
Paul O'Leary McCann
1c70c87daf
Fix autoblack
The conditional needs double equals.
2021-07-10 16:02:39 +09:00
Ines Montani
616f4de034
Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip]
Make the autoblack job not run on forks
2021-07-10 16:41:59 +10:00
Paul O'Leary McCann
b8cdbb4bb6 Make the autoblack job not run on forks
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.

The way to make the job not run on forks is a little non-obvious but
based on this thread.

https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani
d4fecdfb82
Merge pull request #8665 from rynoV/patch-1 [ci skip] 2021-07-10 10:52:15 +10:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip] 2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes 2021-07-09 10:25:56 -06:00
Adriane Boyd
d8805a1073
Fix ru/uk lemmatizer mp with spawn (#8657)
Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.
2021-07-09 15:36:56 +02:00
Adriane Boyd
b8e720fdb9
Fix Azerbaijani init, extend lang init tests (#8656)
* Extend langs in initialize tests

* Fix az init
2021-07-09 15:36:35 +02:00
Ines Montani
1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit 2021-07-09 23:09:24 +10:00
Ines Montani
bbca56687f
Merge pull request #8655 from explosion/autoblack
Auto-format code with black
2021-07-09 23:08:05 +10:00
explosion-bot
334f1f98d8 Auto-format code with black 2021-07-09 08:06:06 +00:00
Adriane Boyd
1ee5bee29d
Add Macedonian models to website (#8637) 2021-07-08 09:32:14 +02:00
Paul O'Leary McCann
1d9209d43a
Merge pull request #8547 from mylibrar/update-universe
Add forte to universe.json
2021-07-08 14:59:49 +09:00
Ines Montani
39c8f7949e Add code preview for textcat_multilabel [ci skip] 2021-07-08 13:33:25 +10:00
Ines Montani
bcd2be40b5
Merge pull request #8634 from rynoV/patch-1 [ci skip] 2021-07-08 12:52:59 +10:00
Calum Sieppert
889c187bc2
Typo fixes 2021-07-07 16:53:04 -06:00
julien-talkair
833f7f2918 👷 configure flake8 pre-commit
* uses setup.cfg for flake8 configuration during pre-commit
2021-07-07 21:31:46 +02:00
Ines Montani
530b5d72f6
Merge pull request #8624 from adrianeboyd/docs/v3-1-usage-updates [ci skip]
Update v3.1 usage docs
2021-07-07 16:50:36 +10:00
Adriane Boyd
6db647dfe0 Update v3.1 usage docs 2021-07-07 08:43:33 +02:00
Sofie Van Landeghem
64fac754fe
add spacy prefix to ngram_suggester.v1 (#8623) 2021-07-07 08:09:30 +02:00
julien-talkair
82b01964fa 🚨 adjust flake8 sensitivity
* pass arguments to flake8
* reproduce arguments from CI config
2021-07-06 22:41:54 +02:00
Sofie Van Landeghem
733e8ceea9
fix spancat initialize with labels (#8620) 2021-07-06 19:08:25 +02:00
Sofie Van Landeghem
608fc1d623
avoid msg var impliciteness (#8619)
* avoid msg var impliciteness

* rename local msg

* Add CI tests for debug data and train

* Adjust debug data CLI test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 19:08:08 +02:00
Sofie Van Landeghem
e7d747e3ee
TransitionBasedParser.v1 to legacy (#8586)
* TransitionBasedParser.v1 to legacy

* register sublayers

* bump spacy-legacy to 3.0.7
2021-07-06 15:26:45 +02:00
Ines Montani
04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip] 2021-07-06 22:20:24 +10:00
Luca Dorigo
e8ef4a46d5
Add the right return type for Language.pipe and an overload for the as_tuples case (#8441)
* Add the right return type for Language.pipe and an overload for the as_tuples version

* Reformat, tidy up

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 14:18:40 +02:00
Sofie Van Landeghem
b9f59118bf
Fix silent evaluation (#8581)
* fix silentness

* sneak in docs typo fix

* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Sofie Van Landeghem
3daf57d70c
Small spancat fixes (#8614)
* two small fixes + additional tests

* rename
2021-07-06 14:15:41 +02:00
Ines Montani
327f83573a
Move scores per type handling into util function (#8590) 2021-07-06 13:02:37 +02:00
Adriane Boyd
5fd0b5207e
Fix vectors check for sourced components (#8559)
* Fix vectors check for sourced components

Since vectors are not loaded when components are sourced, store a hash
for the vectors of each sourced component and compare it to the loaded
vectors after the vectors are loaded from the `[initialize]` block.

* Pop temporary info

* Remove stored hash in remove_pipe

* Add default for pop

* Add additional convert/debug/assemble CLI tests
2021-07-06 12:43:17 +02:00
Adriane Boyd
29906884c5
Raise an error for textcat with <2 labels (#8584)
* Raise an error for textcat with <2 labels

Raise an error if initializing a `textcat` component without at least
two labels.

* Add similar note to docs

* Update positive_label description in API docs
2021-07-06 12:35:22 +02:00
Ines Montani
5bb7fe4b41 Update with HF hub integration [ci skip] 2021-07-06 19:30:59 +10:00
Paul O'Leary McCann
3b1d5350d0
Merge pull request #8609 from mathcass/model-documentation-typo
Fix a command typo in models.md
2021-07-06 14:43:58 +09:00
Cass
7d13fc799b
Fix a command typo in models.md
"dowmload" -> "download"
2021-07-05 18:44:18 -07:00