Commit Graph

15479 Commits

Author SHA1 Message Date
Adriane Boyd
ea450d652c
Exclude strings from v3.2+ source vector checks (#9697)
Exclude strings from `Vector.to_bytes()` comparions for v3.2+ `Vectors`
that now include the string store so that the source vector comparison
is only comparing the vectors and not the strings.
2021-11-19 08:51:19 +01:00
Paul O'Leary McCann
f3981bd0c8
Clarify how to fill in init_tok2vec after pretraining (#9639)
* Clarify how to fill in init_tok2vec after pretraining

* Ignore init_tok2vec arg in pretraining

* Update docs, config setting

* Remove obsolete note about not filling init_tok2vec early

This seems to have also caught some lines that needed cleanup.
2021-11-18 15:38:30 +01:00
Vishnu Nandakumar
86fa37e8ba
Update universe.json with new library eng_spacysentiment (#9679)
* Update universe.json

* Update universe.json

* Cleanup fields

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-16 14:06:19 +09:00
Adriane Boyd
c9baf9d196
Fix spancat for empty docs and zero suggestions (#9654)
* Fix spancat for empty docs and zero suggestions

* Use ops.xp.zeros in test
2021-11-15 12:40:55 +01:00
Sofie Van Landeghem
4694b43d87
Merge pull request #9673 from explosion/master
update develop branch for 3.3
2021-11-15 11:14:49 +01:00
github-actions[bot]
67d8c8a081
Auto-format code with black (#9664)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-11-12 10:00:03 +01:00
Sofie Van Landeghem
24cdd4c88e
Merge pull request #9638 from polm/fix/optional-pretrain-path
Make Jsonl Corpus reader path optional again
2021-11-09 10:45:14 +01:00
Paul O'Leary McCann
8aa2d32ca9 Update jsonlcorpus constructor types 2021-11-09 16:20:19 +09:00
Paul O'Leary McCann
71fb00ed95
Update spacy/training/corpus.py
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-11-08 10:02:29 +00:00
Sofie Van Landeghem
c97f29c593
Merge pull request #9629 from ljvmiranda921/chore/migrate-regressions
Migrate regression and other tests to the new pytest marker
2021-11-08 09:07:38 +01:00
Paul O'Leary McCann
141f12b92e Make Jsonl Corpus reader optional again 2021-11-07 18:56:23 +09:00
Lj Miranda
909177589d Remove utility script 2021-11-06 06:35:58 +08:00
Ines Montani
86af0234ab
Update version [ci skip] 2021-11-05 19:02:35 +01:00
Adriane Boyd
216ed231a9 What's new in v3.2 (#9633)
* What's new in v3.2

* Fix formatting

* Fix typo

* Redo thanks

* Formatting

* Fix typo

* Fix project links

* Fix typo

* Minimal intro, floret python module

* Rephrase

* Rephrase, extend

* Rephrase

* Update links and formatting [ci skip]

* Minor correction

* Fix typo

Co-authored-by: Ines Montani <ines@ines.io>
2021-11-05 16:31:14 +01:00
Adriane Boyd
0fc3dee772
Merge pull request #9596 from adrianeboyd/tests/reenable-v3.2.0-tests
Reenable tests for v3.2.0
2021-11-05 10:54:30 +01:00
github-actions[bot]
5cdb7eb5c2
Auto-format code with black (#9631)
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-11-05 09:58:36 +01:00
Adriane Boyd
e6f91b6f27
Format (#9630) 2021-11-05 09:56:26 +01:00
Lj Miranda
8e7deaf210 Add missing imports in some regression tests
- test_issue7001-8000.py
- test_issue8190.py
2021-11-05 11:47:59 +08:00
Lj Miranda
addeb34bc4 Decorate regression tests
Even if the issue number is already in the file, I still
decorated them just to follow the convention found in test_issue8168.py
2021-11-05 11:47:44 +08:00
Lj Miranda
91dec2c76e Decorate non-regression tests 2021-11-05 11:47:33 +08:00
Lj Miranda
199943deb4 Add simple script to add pytest marks 2021-11-05 11:47:28 +08:00
Duygu Altinok
f0e8c9fe58
Spanish noun chunks review (#9537)
* updated syntax iters

* formatted the code

* added prepositional objects

* code clean up

* eliminated left attached adp

* added es vocab

* added basic tests

* fixed typo

* fixed typo

* list to set

* fixed doc name

* added code for conj

* more tests

* differentiated adjectives and flat

* fixed typo

* added compounds

* more compounds

* tests for compounds

* tests for nominal modifiers

* fixed typo

* fixed typo

* formatted file

* reformatted tests

* fixed typo

* fixed punct typo

* formatted after changes

* added indirect object

* added full sentence examples

* added longer full sentence examples

* fixed sentence length of test

* added passive subj

* added test case by Damian
2021-11-05 00:46:36 +01:00
Duygu Altinok
6e6650307d
Portuguese noun chunks review (#9559)
* added tests

* added pt vocab

* transferred spanish

* added syntax iters

* fixed parenthesis

* added nmod example

* added relative pron

* fixed rel pron

* added rel subclause

* corrected typo

* added more NP chains

* long sentence

* fixed typo

* fixed typo

* fixed typo

* corrected heads

* added passive subj

* added pass subj

* added passive obj

* refinement to rights

* went back to odl

* fixed test

* fixed typo

* fixed typo

* formatted

* Format

* Format test cases

Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-11-04 23:55:49 +01:00
Adriane Boyd
2bf52c44b1
Merge pull request #9612 from adrianeboyd/chore/switch-to-master-v3.2.0
Switch v3.2.0 to master
2021-11-03 16:27:34 +01:00
Adriane Boyd
07dea324f6 Merge remote-tracking branch 'upstream/develop' into chore/switch-to-master-v3.2.0 2021-11-03 15:32:18 +01:00
Bram Vanroy
cab9209c3d
use metaclass to decorate errors (#9593) 2021-11-03 15:29:32 +01:00
Paul O'Leary McCann
c1cc94a33a
Fix typo about receptive field size (#9564) 2021-11-03 15:16:55 +01:00
Adriane Boyd
e06bbf72a4
Fix tok2vec-less textcat generation in website quickstart (#9610) 2021-11-03 15:11:07 +01:00
Adriane Boyd
db0d8c56d0
Add test for Language.pipe as_tuples with custom error handlers (#9608)
* make nlp.pipe() return None docs when no exceptions are (re-)raised during error handling

* Remove changes other than as_tuples test

* Only check warning count for one process

* Fix types

* Format

Co-authored-by: Xi Bai <xi.bai.ed@gmail.com>
2021-11-03 10:57:34 +01:00
Adriane Boyd
79cea03983
Update website model display (#9589)
* Remove vectors from core trf model descriptions

* Update accuracy labels and exclude morph_acc for ja
2021-11-03 09:56:00 +01:00
Paul O'Leary McCann
e43639b27a
Add note about round-trip serializing pipeline to API docs (#9583) 2021-11-03 09:55:30 +01:00
Adriane Boyd
6eee024ff6
Pickle Doc._context (#9603) 2021-11-03 09:14:29 +01:00
Adriane Boyd
61daac54e4
Serialize _context separately in multiprocessing pipe (#9597)
* Serialize _context with Doc

* Revert "Serialize _context with Doc"

This reverts commit 161f1fac91.

* Serialize Doc._context separately for multiprocessing pipe
2021-11-03 07:51:53 +01:00
Adriane Boyd
5a979137a7
Set as_tuples on Doc during processing (#9592)
* Set as_tuples on Doc during processing

* Fix types

* Format
2021-11-02 15:08:22 +01:00
Adriane Boyd
c155f333bb Revert "Temporarily use v3.1.0 models in CI"
This reverts commit bd6433bbab.
2021-11-02 14:25:05 +01:00
Adriane Boyd
53a3523910 Revert "Temporarily ignore W095 in assemble CLI CI test (#9460)"
This reverts commit 8db574e0b5.
2021-11-02 14:24:54 +01:00
Adriane Boyd
4d5db737e9 Revert "Temporarily skip compat tests (#9594)"
This reverts commit 667572adca.
2021-11-02 14:24:06 +01:00
Adriane Boyd
667572adca
Temporarily skip compat tests (#9594) 2021-11-02 14:10:48 +01:00
Lj Miranda
f1bc655a38
Add initial Tagalog (tl) tests (#9582)
* Add tl_tokenizer to test fixtures

* Add tagalog tests
2021-11-02 08:35:49 +01:00
xxyzz
90ec820f05
Add WordDumb to spaCy Universe (#9572)
* Add WordDumb to spaCy Universe

* Add standalone category

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:41 +09:00
Bruce W. Lee (이웅성)
a4dcb68cf6
Adding LingFeat Software to spaCy Universe. (#9574)
* add lingfeat in universe

* add lingfeat in universe

* Fix JSON

* Minor cleanup

Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:14 +09:00
Vasundhara
5279c7c4ba
Fix broken link to mappings-exceptions (#9573) 2021-10-31 13:44:29 +09:00
Adriane Boyd
bb26550e22
Fix StaticVectors after floret+mypy merge (#9566) 2021-10-29 16:25:43 +02:00
Adriane Boyd
322635e371
Set version to v3.2.0 (#9565) 2021-10-29 15:22:40 +02:00
Adriane Boyd
5e9db156c2
Merge pull request #9563 from adrianeboyd/chore/update-develop-from-master-v3.2-3
Update develop from master for v3.2
2021-10-29 14:08:14 +02:00
Adriane Boyd
2d430958e1 Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-3 2021-10-29 12:18:15 +02:00
Paul O'Leary McCann
006df1ae1f
Clarify error when words are of wrong type (#9541)
* Clarify error when words are of wrong type

See #9437

* Update docs

* Use try/except

* Apply suggestions from code review

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>

Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-29 12:08:40 +02:00
Paul O'Leary McCann
2fd8d616e7
Add docs section for spacy.cli.train.train (#9545)
* Add section for spacy.cli.train.train

* Add link from training page to train function

* Ensure path in train helper

* Update docs

Co-authored-by: Ines Montani <ines@ines.io>
2021-10-29 10:36:34 +02:00
Adriane Boyd
5477453ea3
Docs for thinc-apple-ops (#9549)
* Docs for thinc-apple-ops

* Ignore thinc-apple-ops in reqs tests

* Fix install quickstart

* Add cupy cuda 113, 114 extras

* Remove draft section

Co-authored-by: Ines Montani <ines@ines.io>
2021-10-29 10:35:31 +02:00
Adriane Boyd
12974bf4d9
Add micro PRF for morph scoring (#9546)
* Add micro PRF for morph scoring

For pipelines where morph features are added by more than one component
and a reference training corpus may not contain all features, a micro
PRF score is more flexible than a simple accuracy score. An example is
the reading and inflection features added by the Japanese tokenizer.

* Use `morph_micro_f` as the default morph score for Japanese
morphologizers.

* Update docstring

* Fix typo in docstring

* Update Scorer API docs

* Fix results type

* Organize score list by attribute prefix
2021-10-29 10:29:29 +02:00