Adriane Boyd
ea450d652c
Exclude strings from v3.2+ source vector checks ( #9697 )
...
Exclude strings from `Vector.to_bytes()` comparions for v3.2+ `Vectors`
that now include the string store so that the source vector comparison
is only comparing the vectors and not the strings.
2021-11-19 08:51:19 +01:00
Paul O'Leary McCann
f3981bd0c8
Clarify how to fill in init_tok2vec after pretraining ( #9639 )
...
* Clarify how to fill in init_tok2vec after pretraining
* Ignore init_tok2vec arg in pretraining
* Update docs, config setting
* Remove obsolete note about not filling init_tok2vec early
This seems to have also caught some lines that needed cleanup.
2021-11-18 15:38:30 +01:00
Vishnu Nandakumar
86fa37e8ba
Update universe.json with new library eng_spacysentiment ( #9679 )
...
* Update universe.json
* Update universe.json
* Cleanup fields
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-16 14:06:19 +09:00
Adriane Boyd
c9baf9d196
Fix spancat for empty docs and zero suggestions ( #9654 )
...
* Fix spancat for empty docs and zero suggestions
* Use ops.xp.zeros in test
2021-11-15 12:40:55 +01:00
github-actions[bot]
67d8c8a081
Auto-format code with black ( #9664 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
2021-11-12 10:00:03 +01:00
Sofie Van Landeghem
24cdd4c88e
Merge pull request #9638 from polm/fix/optional-pretrain-path
...
Make Jsonl Corpus reader path optional again
2021-11-09 10:45:14 +01:00
Paul O'Leary McCann
8aa2d32ca9
Update jsonlcorpus constructor types
2021-11-09 16:20:19 +09:00
Paul O'Leary McCann
71fb00ed95
Update spacy/training/corpus.py
...
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
2021-11-08 10:02:29 +00:00
Sofie Van Landeghem
c97f29c593
Merge pull request #9629 from ljvmiranda921/chore/migrate-regressions
...
Migrate regression and other tests to the new pytest marker
2021-11-08 09:07:38 +01:00
Paul O'Leary McCann
141f12b92e
Make Jsonl Corpus reader optional again
2021-11-07 18:56:23 +09:00
Lj Miranda
909177589d
Remove utility script
2021-11-06 06:35:58 +08:00
Ines Montani
86af0234ab
Update version [ci skip]
2021-11-05 19:02:35 +01:00
Adriane Boyd
216ed231a9
What's new in v3.2 ( #9633 )
...
* What's new in v3.2
* Fix formatting
* Fix typo
* Redo thanks
* Formatting
* Fix typo
* Fix project links
* Fix typo
* Minimal intro, floret python module
* Rephrase
* Rephrase, extend
* Rephrase
* Update links and formatting [ci skip]
* Minor correction
* Fix typo
Co-authored-by: Ines Montani <ines@ines.io>
2021-11-05 16:31:14 +01:00
Adriane Boyd
0fc3dee772
Merge pull request #9596 from adrianeboyd/tests/reenable-v3.2.0-tests
...
Reenable tests for v3.2.0
2021-11-05 10:54:30 +01:00
github-actions[bot]
5cdb7eb5c2
Auto-format code with black ( #9631 )
...
Co-authored-by: explosion-bot <explosion-bot@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-11-05 09:58:36 +01:00
Adriane Boyd
e6f91b6f27
Format ( #9630 )
2021-11-05 09:56:26 +01:00
Lj Miranda
8e7deaf210
Add missing imports in some regression tests
...
- test_issue7001-8000.py
- test_issue8190.py
2021-11-05 11:47:59 +08:00
Lj Miranda
addeb34bc4
Decorate regression tests
...
Even if the issue number is already in the file, I still
decorated them just to follow the convention found in test_issue8168.py
2021-11-05 11:47:44 +08:00
Lj Miranda
91dec2c76e
Decorate non-regression tests
2021-11-05 11:47:33 +08:00
Lj Miranda
199943deb4
Add simple script to add pytest marks
2021-11-05 11:47:28 +08:00
Duygu Altinok
f0e8c9fe58
Spanish noun chunks review ( #9537 )
...
* updated syntax iters
* formatted the code
* added prepositional objects
* code clean up
* eliminated left attached adp
* added es vocab
* added basic tests
* fixed typo
* fixed typo
* list to set
* fixed doc name
* added code for conj
* more tests
* differentiated adjectives and flat
* fixed typo
* added compounds
* more compounds
* tests for compounds
* tests for nominal modifiers
* fixed typo
* fixed typo
* formatted file
* reformatted tests
* fixed typo
* fixed punct typo
* formatted after changes
* added indirect object
* added full sentence examples
* added longer full sentence examples
* fixed sentence length of test
* added passive subj
* added test case by Damian
2021-11-05 00:46:36 +01:00
Duygu Altinok
6e6650307d
Portuguese noun chunks review ( #9559 )
...
* added tests
* added pt vocab
* transferred spanish
* added syntax iters
* fixed parenthesis
* added nmod example
* added relative pron
* fixed rel pron
* added rel subclause
* corrected typo
* added more NP chains
* long sentence
* fixed typo
* fixed typo
* fixed typo
* corrected heads
* added passive subj
* added pass subj
* added passive obj
* refinement to rights
* went back to odl
* fixed test
* fixed typo
* fixed typo
* formatted
* Format
* Format test cases
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-11-04 23:55:49 +01:00
Adriane Boyd
2bf52c44b1
Merge pull request #9612 from adrianeboyd/chore/switch-to-master-v3.2.0
...
Switch v3.2.0 to master
2021-11-03 16:27:34 +01:00
Adriane Boyd
07dea324f6
Merge remote-tracking branch 'upstream/develop' into chore/switch-to-master-v3.2.0
2021-11-03 15:32:18 +01:00
Bram Vanroy
cab9209c3d
use metaclass to decorate errors ( #9593 )
2021-11-03 15:29:32 +01:00
Paul O'Leary McCann
c1cc94a33a
Fix typo about receptive field size ( #9564 )
2021-11-03 15:16:55 +01:00
Adriane Boyd
e06bbf72a4
Fix tok2vec-less textcat generation in website quickstart ( #9610 )
2021-11-03 15:11:07 +01:00
Adriane Boyd
db0d8c56d0
Add test for Language.pipe as_tuples with custom error handlers ( #9608 )
...
* make nlp.pipe() return None docs when no exceptions are (re-)raised during error handling
* Remove changes other than as_tuples test
* Only check warning count for one process
* Fix types
* Format
Co-authored-by: Xi Bai <xi.bai.ed@gmail.com>
2021-11-03 10:57:34 +01:00
Adriane Boyd
79cea03983
Update website model display ( #9589 )
...
* Remove vectors from core trf model descriptions
* Update accuracy labels and exclude morph_acc for ja
2021-11-03 09:56:00 +01:00
Paul O'Leary McCann
e43639b27a
Add note about round-trip serializing pipeline to API docs ( #9583 )
2021-11-03 09:55:30 +01:00
Adriane Boyd
6eee024ff6
Pickle Doc._context ( #9603 )
2021-11-03 09:14:29 +01:00
Adriane Boyd
61daac54e4
Serialize _context separately in multiprocessing pipe ( #9597 )
...
* Serialize _context with Doc
* Revert "Serialize _context with Doc"
This reverts commit 161f1fac91
.
* Serialize Doc._context separately for multiprocessing pipe
2021-11-03 07:51:53 +01:00
Adriane Boyd
5a979137a7
Set as_tuples on Doc during processing ( #9592 )
...
* Set as_tuples on Doc during processing
* Fix types
* Format
2021-11-02 15:08:22 +01:00
Adriane Boyd
c155f333bb
Revert "Temporarily use v3.1.0 models in CI"
...
This reverts commit bd6433bbab
.
2021-11-02 14:25:05 +01:00
Adriane Boyd
53a3523910
Revert "Temporarily ignore W095 in assemble CLI CI test ( #9460 )"
...
This reverts commit 8db574e0b5
.
2021-11-02 14:24:54 +01:00
Adriane Boyd
4d5db737e9
Revert "Temporarily skip compat tests ( #9594 )"
...
This reverts commit 667572adca
.
2021-11-02 14:24:06 +01:00
Adriane Boyd
667572adca
Temporarily skip compat tests ( #9594 )
2021-11-02 14:10:48 +01:00
Lj Miranda
f1bc655a38
Add initial Tagalog (tl) tests ( #9582 )
...
* Add tl_tokenizer to test fixtures
* Add tagalog tests
2021-11-02 08:35:49 +01:00
xxyzz
90ec820f05
Add WordDumb to spaCy Universe ( #9572 )
...
* Add WordDumb to spaCy Universe
* Add standalone category
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:41 +09:00
Bruce W. Lee (이웅성)
a4dcb68cf6
Adding LingFeat Software to spaCy Universe. ( #9574 )
...
* add lingfeat in universe
* add lingfeat in universe
* Fix JSON
* Minor cleanup
Co-authored-by: Paul O'Leary McCann <polm@dampfkraft.com>
2021-11-01 18:38:14 +09:00
Vasundhara
5279c7c4ba
Fix broken link to mappings-exceptions ( #9573 )
2021-10-31 13:44:29 +09:00
Adriane Boyd
bb26550e22
Fix StaticVectors after floret+mypy merge ( #9566 )
2021-10-29 16:25:43 +02:00
Adriane Boyd
322635e371
Set version to v3.2.0 ( #9565 )
2021-10-29 15:22:40 +02:00
Adriane Boyd
5e9db156c2
Merge pull request #9563 from adrianeboyd/chore/update-develop-from-master-v3.2-3
...
Update develop from master for v3.2
2021-10-29 14:08:14 +02:00
Adriane Boyd
2d430958e1
Merge remote-tracking branch 'upstream/master' into chore/update-develop-from-master-v3.2-3
2021-10-29 12:18:15 +02:00
Paul O'Leary McCann
006df1ae1f
Clarify error when words are of wrong type ( #9541 )
...
* Clarify error when words are of wrong type
See #9437
* Update docs
* Use try/except
* Apply suggestions from code review
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>
Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
2021-10-29 12:08:40 +02:00
Paul O'Leary McCann
2fd8d616e7
Add docs section for spacy.cli.train.train ( #9545 )
...
* Add section for spacy.cli.train.train
* Add link from training page to train function
* Ensure path in train helper
* Update docs
Co-authored-by: Ines Montani <ines@ines.io>
2021-10-29 10:36:34 +02:00
Adriane Boyd
5477453ea3
Docs for thinc-apple-ops ( #9549 )
...
* Docs for thinc-apple-ops
* Ignore thinc-apple-ops in reqs tests
* Fix install quickstart
* Add cupy cuda 113, 114 extras
* Remove draft section
Co-authored-by: Ines Montani <ines@ines.io>
2021-10-29 10:35:31 +02:00
Adriane Boyd
12974bf4d9
Add micro PRF for morph scoring ( #9546 )
...
* Add micro PRF for morph scoring
For pipelines where morph features are added by more than one component
and a reference training corpus may not contain all features, a micro
PRF score is more flexible than a simple accuracy score. An example is
the reading and inflection features added by the Japanese tokenizer.
* Use `morph_micro_f` as the default morph score for Japanese
morphologizers.
* Update docstring
* Fix typo in docstring
* Update Scorer API docs
* Fix results type
* Organize score list by attribute prefix
2021-10-29 10:29:29 +02:00
Philip Vollet
76173b0866
fixed typo and URL ( #9560 )
2021-10-29 13:57:44 +09:00