Commit Graph

15686 Commits

Author SHA1 Message Date
Adriane Boyd
9ac6d4991e
Add doc_cleaner component (#9659)
* Add doc_cleaner component

* Fix types

* Fix loop

* Rephrase method description
2021-11-23 15:33:33 +01:00
Adriane Boyd
a77f50baa4
Allow Scorer.score_spans to handle pred docs with missing annotation (#9701)
If the predicted docs are missing annotation according to
`has_annotation`, treat the docs as having no predictions rather than
raising errors when the annotation is missing.

The motivation for this is a combined tokenization+sents scorer for a
component where the sents annotation is optional. To provide a single
scorer in the component factory, it needs to be possible for the scorer
to continue despite missing sents annotation in the case where the
component is not annotating sents.
2021-11-23 15:17:19 +01:00
Adriane Boyd
36c7047946
Use reference parse to initialize parser moves (#9722) 2021-11-23 14:55:55 +01:00
Paul O'Leary McCann
52b8c2d2e0
Add note on batch contract for listeners (#9691)
* Add note on batch contract

Using listeners requires batches to be consistent. This is obvious if
you understand how the listener works, but it wasn't clearly stated in
the Docs, and was subtle enough that the EntityLinker missed it.

There is probably a clearer way to explain what the actual requirement
is, but I figure this is a good start.

* Rewrite to clarify role of caching
2021-11-22 11:06:07 +01:00
Richard Hudson
a1f25412da
Edited Slovenian stop words list (#9707) 2021-11-22 09:46:34 +01:00
Sofie Van Landeghem
13645dcbf5
add note that annotating components is new since 3.1 (#9678) 2021-11-22 14:43:11 +09:00
Adriane Boyd
0e93b315f3
Convert labels to strings for README in package CLI (#9694) 2021-11-19 08:51:46 +01:00
Adriane Boyd
ea450d652c
Exclude strings from v3.2+ source vector checks (#9697)
Exclude strings from `Vector.to_bytes()` comparions for v3.2+ `Vectors`
that now include the string store so that the source vector comparison
is only comparing the vectors and not the strings.
2021-11-19 08:51:19 +01:00
Paul O'Leary McCann
f3981bd0c8
Clarify how to fill in init_tok2vec after pretraining (#9639)
* Clarify how to fill in init_tok2vec after pretraining

* Ignore init_tok2vec arg in pretraining

* Update docs, config setting

* Remove obsolete note about not filling init_tok2vec early

This seems to have also caught some lines that needed cleanup.
2021-11-18 15:38:30 +01:00
Vishnu Nandakumar
86fa37e8ba
Update universe.json with new library eng_spacysentiment (#9679)
* Update universe.json

* Update universe.json

* Cleanup fields

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-16 14:06:19 +09:00
Adriane Boyd
c9baf9d196
Fix spancat for empty docs and zero suggestions (#9654)
* Fix spancat for empty docs and zero suggestions

* Use ops.xp.zeros in test
2021-11-15 12:40:55 +01:00
Sofie Van Landeghem
4694b43d87
Merge pull request #9673 from explosion/master
update develop branch for 3.3
2021-11-15 11:14:49 +01:00
github-actions[bot]
67d8c8a081
Auto-format code with black (#9664)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-11-12 10:00:03 +01:00
Sofie Van Landeghem
24cdd4c88e
Merge pull request #9638 from polm/fix/optional-pretrain-path
Make Jsonl Corpus reader path optional again
2021-11-09 10:45:14 +01:00
Paul O'Leary McCann
8aa2d32ca9 Update jsonlcorpus constructor types 2021-11-09 16:20:19 +09:00
Paul O'Leary McCann
71fb00ed95
Update spacy/training/corpus.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-11-08 10:02:29 +00:00
Sofie Van Landeghem
c97f29c593
Merge pull request #9629 from ljvmiranda921/chore/migrate-regressions
Migrate regression and other tests to the new pytest marker
2021-11-08 09:07:38 +01:00
Paul O'Leary McCann
141f12b92e Make Jsonl Corpus reader optional again 2021-11-07 18:56:23 +09:00
Lj Miranda
909177589d Remove utility script 2021-11-06 06:35:58 +08:00
Ines Montani
86af0234ab
Update version [ci skip] 2021-11-05 19:02:35 +01:00
Adriane Boyd
216ed231a9 What's new in v3.2 (#9633)
* What's new in v3.2

* Fix formatting

* Fix typo

* Redo thanks

* Formatting

* Fix typo

* Fix project links

* Fix typo

* Minimal intro, floret python module

* Rephrase

* Rephrase, extend

* Rephrase

* Update links and formatting [ci skip]

* Minor correction

* Fix typo

Co-authored-by: Ines Montani <ines@ines.io>
2021-11-05 16:31:14 +01:00
Adriane Boyd
0fc3dee772
Merge pull request #9596 from adrianeboyd/tests/reenable-v3.2.0-tests
Reenable tests for v3.2.0
2021-11-05 10:54:30 +01:00
github-actions[bot]
5cdb7eb5c2
Auto-format code with black (#9631)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-11-05 09:58:36 +01:00
Adriane Boyd
e6f91b6f27
Format (#9630) 2021-11-05 09:56:26 +01:00
Lj Miranda
8e7deaf210 Add missing imports in some regression tests
- test_issue7001-8000.py
- test_issue8190.py
2021-11-05 11:47:59 +08:00
Lj Miranda
addeb34bc4 Decorate regression tests
Even if the issue number is already in the file, I still
decorated them just to follow the convention found in test_issue8168.py
2021-11-05 11:47:44 +08:00
Lj Miranda
91dec2c76e Decorate non-regression tests 2021-11-05 11:47:33 +08:00
Lj Miranda
199943deb4 Add simple script to add pytest marks 2021-11-05 11:47:28 +08:00
Duygu Altinok
f0e8c9fe58
Spanish noun chunks review (#9537)
* updated syntax iters

* formatted the code

* added prepositional objects

* code clean up

* eliminated left attached adp

* added es vocab

* added basic tests

* fixed typo

* fixed typo

* list to set

* fixed doc name

* added code for conj

* more tests

* differentiated adjectives and flat

* fixed typo

* added compounds

* more compounds

* tests for compounds

* tests for nominal modifiers

* fixed typo

* fixed typo

* formatted file

* reformatted tests

* fixed typo

* fixed punct typo

* formatted after changes

* added indirect object

* added full sentence examples

* added longer full sentence examples

* fixed sentence length of test

* added passive subj

* added test case by Damian
2021-11-05 00:46:36 +01:00
Duygu Altinok
6e6650307d
Portuguese noun chunks review (#9559)
* added tests

* added pt vocab

* transferred spanish

* added syntax iters

* fixed parenthesis

* added nmod example

* added relative pron

* fixed rel pron

* added rel subclause

* corrected typo

* added more NP chains

* long sentence

* fixed typo

* fixed typo

* fixed typo

* corrected heads

* added passive subj

* added pass subj

* added passive obj

* refinement to rights

* went back to odl

* fixed test

* fixed typo

* fixed typo

* formatted

* Format

* Format test cases

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-11-04 23:55:49 +01:00
Adriane Boyd
2bf52c44b1
Merge pull request #9612 from adrianeboyd/chore/switch-to-master-v3.2.0
Switch v3.2.0 to master
2021-11-03 16:27:34 +01:00
Adriane Boyd
07dea324f6 Merge remote-tracking branch 'upstream/develop' into chore/switch-to-master-v3.2.0 2021-11-03 15:32:18 +01:00
Bram Vanroy
cab9209c3d
use metaclass to decorate errors (#9593) 2021-11-03 15:29:32 +01:00
Paul O'Leary McCann
c1cc94a33a
Fix typo about receptive field size (#9564) 2021-11-03 15:16:55 +01:00
Adriane Boyd
e06bbf72a4
Fix tok2vec-less textcat generation in website quickstart (#9610) 2021-11-03 15:11:07 +01:00
Adriane Boyd
db0d8c56d0
Add test for Language.pipe as_tuples with custom error handlers (#9608)
* make nlp.pipe() return None docs when no exceptions are (re-)raised during error handling

* Remove changes other than as_tuples test

* Only check warning count for one process

* Fix types

* Format

Co-authored-by: Xi Bai <xi.bai.ed@gmail.com>
2021-11-03 10:57:34 +01:00
Adriane Boyd
79cea03983
Update website model display (#9589)
* Remove vectors from core trf model descriptions

* Update accuracy labels and exclude morph_acc for ja
2021-11-03 09:56:00 +01:00
Paul O'Leary McCann
e43639b27a
Add note about round-trip serializing pipeline to API docs (#9583) 2021-11-03 09:55:30 +01:00
Adriane Boyd
6eee024ff6
Pickle Doc._context (#9603) 2021-11-03 09:14:29 +01:00
Adriane Boyd
61daac54e4
Serialize _context separately in multiprocessing pipe (#9597)
* Serialize _context with Doc

* Revert "Serialize _context with Doc"

This reverts commit 161f1fac91.

* Serialize Doc._context separately for multiprocessing pipe
2021-11-03 07:51:53 +01:00
Adriane Boyd
5a979137a7
Set as_tuples on Doc during processing (#9592)
* Set as_tuples on Doc during processing

* Fix types

* Format
2021-11-02 15:08:22 +01:00
Adriane Boyd
c155f333bb Revert "Temporarily use v3.1.0 models in CI"
This reverts commit bd6433bbab.
2021-11-02 14:25:05 +01:00
Adriane Boyd
53a3523910 Revert "Temporarily ignore W095 in assemble CLI CI test (#9460)"
This reverts commit 8db574e0b5.
2021-11-02 14:24:54 +01:00
Adriane Boyd
4d5db737e9 Revert "Temporarily skip compat tests (#9594)"
This reverts commit 667572adca.
2021-11-02 14:24:06 +01:00
Adriane Boyd
667572adca
Temporarily skip compat tests (#9594) 2021-11-02 14:10:48 +01:00
Lj Miranda
f1bc655a38
Add initial Tagalog (tl) tests (#9582)
* Add tl_tokenizer to test fixtures

* Add tagalog tests
2021-11-02 08:35:49 +01:00
xxyzz
90ec820f05
Add WordDumb to spaCy Universe (#9572)
* Add WordDumb to spaCy Universe

* Add standalone category

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:41 +09:00
Bruce W. Lee (이웅성)
a4dcb68cf6
Adding LingFeat Software to spaCy Universe. (#9574)
* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:14 +09:00
Vasundhara
5279c7c4ba
Fix broken link to mappings-exceptions (#9573) 2021-10-31 13:44:29 +09:00
Adriane Boyd
bb26550e22
Fix StaticVectors after floret+mypy merge (#9566) 2021-10-29 16:25:43 +02:00