Commit Graph

15677 Commits

Author SHA1 Message Date
Julien Rossi
e117573822
Adding noun_chunks to the DUTCH language model (nl) (#8529)
*  implement noun_chunks for dutch language

* copy/paste FR and SV syntax iterators to accomodate UD tags
* added tests with dutch text
* signed contributor agreement

* 🐛 fix noun chunks generator

* built from scratch
* define noun chunk as a single Noun-Phrase
* includes some corner cases debugging (incorrect POS tagging)
* test with provided annotated sample (POS, DEP)

*  fix failing test

* CI pipeline did not like the added sample file
* add the sample as a pytest fixture

* Update spacy/lang/nl/syntax_iterators.py

* Update spacy/lang/nl/syntax_iterators.py

Code readability

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* Update spacy/tests/lang/nl/test_noun_chunks.py

correct comment

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>

* finalize code

* change "if next_word" into "if next_word is not None"

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-07-14 14:01:02 +02:00
Ines Montani
8ca6c58625 Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:56 +10:00
Ines Montani
2a8eeed5da
Merge pull request #8703 from thomashacker/update/spacy-stanza [ci skip]
Update spacy-stanza universe.json
2021-07-13 19:03:42 +10:00
thomashacker
aafb89df78 Update universe.json code_example 2021-07-13 10:22:49 +02:00
KennethEnevoldsen
e5127992a0 added agreement 2021-07-13 10:11:02 +02:00
Kenneth Enevoldsen
94ce904e10
added missing comma 2021-07-13 09:59:34 +02:00
Kenneth Enevoldsen
a81fcc81b0
added dacy to universe 2021-07-13 09:54:08 +02:00
Adriane Boyd
f9fd2889b7
Use 0-vector for OOV lexemes (#8639) 2021-07-13 14:48:12 +10:00
Edward
8233359225
Fix preservation of spacy package meta (#8663)
* update package meta with existing_meta and nlp_meta

* Add spaCy contributor agreement

* Added more info when creating readme
2021-07-12 11:18:52 +02:00
Paul O'Leary McCann
1c70c87daf
Fix autoblack
The conditional needs double equals.
2021-07-10 16:02:39 +09:00
Ines Montani
616f4de034
Merge pull request #8674 from polm/fix/autoblack-no-forks [ci skip]
Make the autoblack job not run on forks
2021-07-10 16:41:59 +10:00
Paul O'Leary McCann
b8cdbb4bb6 Make the autoblack job not run on forks
The autoblack job is an occasional cleanup job. If it runs on forks and
those PRs are accepted the git history will be weird and that doesn't
help anyone.

The way to make the job not run on forks is a little non-obvious but
based on this thread.

https://github.com/prisma/prisma/issues/3539
2021-07-10 15:38:20 +09:00
Ines Montani
d8ae5750a6 Merge pull request #8665 from rynoV/patch-1 [ci skip] 2021-07-10 10:52:39 +10:00
Ines Montani
d4fecdfb82
Merge pull request #8665 from rynoV/patch-1 [ci skip] 2021-07-10 10:52:15 +10:00
Ines Montani
50000d37e4
Avoid double parentheses [ci skip] 2021-07-10 10:52:01 +10:00
Calum Sieppert
e2d53aa1a6
Typo fixes 2021-07-09 10:25:56 -06:00
Adriane Boyd
d8805a1073
Fix ru/uk lemmatizer mp with spawn (#8657)
Use an instance variable instead a class variable for the morphological
analzyer so that multiprocessing with spawn is possible.
2021-07-09 15:36:56 +02:00
Adriane Boyd
b8e720fdb9
Fix Azerbaijani init, extend lang init tests (#8656)
* Extend langs in initialize tests

* Fix az init
2021-07-09 15:36:35 +02:00
Ines Montani
1c0ed22d1e
Merge pull request #8573 from julien-talkair/code-quality-pre-commit 2021-07-09 23:09:24 +10:00
Ines Montani
bbca56687f
Merge pull request #8655 from explosion/autoblack
Auto-format code with black
2021-07-09 23:08:05 +10:00
explosion-bot
334f1f98d8 Auto-format code with black 2021-07-09 08:06:06 +00:00
Adriane Boyd
363230de19 Add Macedonian models to website (#8637) 2021-07-08 09:32:41 +02:00
Adriane Boyd
1ee5bee29d
Add Macedonian models to website (#8637) 2021-07-08 09:32:14 +02:00
Suqi Sun
20a2beafb5 Update pip 2021-07-08 15:09:52 +09:00
Suqi Sun
c61ecb6f7c Update pip and code example 2021-07-08 15:09:52 +09:00
Suqi Sun
f011126ebd Add forte to universe.json 2021-07-08 15:09:52 +09:00
Paul O'Leary McCann
1d9209d43a
Merge pull request #8547 from mylibrar/update-universe
Add forte to universe.json
2021-07-08 14:59:49 +09:00
Ines Montani
cdc0d669c1 Add code preview for textcat_multilabel [ci skip] 2021-07-08 13:33:33 +10:00
Ines Montani
39c8f7949e Add code preview for textcat_multilabel [ci skip] 2021-07-08 13:33:25 +10:00
Calum Sieppert
e7f8573e01 Typo fixes 2021-07-08 12:53:23 +10:00
Ines Montani
bcd2be40b5
Merge pull request #8634 from rynoV/patch-1 [ci skip] 2021-07-08 12:52:59 +10:00
Calum Sieppert
889c187bc2
Typo fixes 2021-07-07 16:53:04 -06:00
julien-talkair
833f7f2918 👷 configure flake8 pre-commit
* uses setup.cfg for flake8 configuration during pre-commit
2021-07-07 21:31:46 +02:00
Ines Montani
2cbe250381
Merge pull request #8631 from adrianeboyd/chore/update-spacy-io-v3-1 2021-07-08 00:33:17 +10:00
Adriane Boyd
c2fa62e690 Revert "Restrict website to 3.0 models until 3.1 release"
This reverts commit a13f29bb54.
2021-07-07 15:31:40 +02:00
Adriane Boyd
065e94a6eb Merge branch 'master' into spacy.io 2021-07-07 15:30:25 +02:00
Ines Montani
530b5d72f6
Merge pull request #8624 from adrianeboyd/docs/v3-1-usage-updates [ci skip]
Update v3.1 usage docs
2021-07-07 16:50:36 +10:00
Adriane Boyd
6db647dfe0 Update v3.1 usage docs 2021-07-07 08:43:33 +02:00
Sofie Van Landeghem
64fac754fe
add spacy prefix to ngram_suggester.v1 (#8623) 2021-07-07 08:09:30 +02:00
julien-talkair
82b01964fa 🚨 adjust flake8 sensitivity
* pass arguments to flake8
* reproduce arguments from CI config
2021-07-06 22:41:54 +02:00
Sofie Van Landeghem
733e8ceea9
fix spancat initialize with labels (#8620) 2021-07-06 19:08:25 +02:00
Sofie Van Landeghem
608fc1d623
avoid msg var impliciteness (#8619)
* avoid msg var impliciteness

* rename local msg

* Add CI tests for debug data and train

* Adjust debug data CLI test

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 19:08:08 +02:00
Sofie Van Landeghem
e7d747e3ee
TransitionBasedParser.v1 to legacy (#8586)
* TransitionBasedParser.v1 to legacy

* register sublayers

* bump spacy-legacy to 3.0.7
2021-07-06 15:26:45 +02:00
Ines Montani
04a9ade40f
Merge pull request #8466 from explosion/docs/new-in-v3-1 [ci skip] 2021-07-06 22:20:24 +10:00
Luca Dorigo
e8ef4a46d5
Add the right return type for Language.pipe and an overload for the as_tuples case (#8441)
* Add the right return type for Language.pipe and an overload for the as_tuples version

* Reformat, tidy up

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-07-06 14:18:40 +02:00
Sofie Van Landeghem
b9f59118bf
Fix silent evaluation (#8581)
* fix silentness

* sneak in docs typo fix

* pass silent boolean instead
2021-07-06 14:16:19 +02:00
Sofie Van Landeghem
3daf57d70c
Small spancat fixes (#8614)
* two small fixes + additional tests

* rename
2021-07-06 14:15:41 +02:00
Ines Montani
327f83573a
Move scores per type handling into util function (#8590) 2021-07-06 13:02:37 +02:00
Adriane Boyd
5fd0b5207e
Fix vectors check for sourced components (#8559)
* Fix vectors check for sourced components

Since vectors are not loaded when components are sourced, store a hash
for the vectors of each sourced component and compare it to the loaded
vectors after the vectors are loaded from the `[initialize]` block.

* Pop temporary info

* Remove stored hash in remove_pipe

* Add default for pop

* Add additional convert/debug/assemble CLI tests
2021-07-06 12:43:17 +02:00
Adriane Boyd
29906884c5
Raise an error for textcat with <2 labels (#8584)
* Raise an error for textcat with <2 labels

Raise an error if initializing a `textcat` component without at least
two labels.

* Add similar note to docs

* Update positive_label description in API docs
2021-07-06 12:35:22 +02:00